AD-A074  833  LOGICON  INC  FAN  DIEGO  CA  TACTICAL  AND  TRAINING  SYSTE— ETC  F/6  9/4 
VOICE  INTERACTIVE  ANALYSIS  SYSTEM  STUOY.(U) 

JUN  79  D  P  HARRY »  J  E  PORTER.  W  J  SATZER  N61339-78-C-0141 

NAVTRAEOUIPC-78-C-0141-1  NL 


UNCLASSIFIED 


VOICE  INTERACTIVE  ANALYSIS  SYSTEM  STUDY 


D.P.  Harry,  J.E.  Porter  and  W.J.  Satzer 
Logicon,  Inc. 

Tactical  &  Training  Systems  Division  ^ 

Post  Office  Box  80158  V', 

San  Diego,  California  92138 


June  1979 


FINAL  REPORT  28  AUGUST  1978  -  23  MARCH  1979 


POD  Distribution  Statement 

Approved  for  public  release; 
distribution  unlimited. 


NAVAL  TRAINING  EQUIPMENT  CENTER 
ORLANDO,  FLORIDA  32813 


79  l0  09  041j 


READ  INSTRUCTIONS 


REPORT  DOCUMENTATION  PAGE 


BEFORE  COMPLETING  FORM 


|l  GOVT  ACC*»»ION  NO 


NAVTRAEQU I  PC 


Technical  Result/-  Final 
28  AUG  £678  -  23  MAR  X979 
»U( oruimu  org  RreoRT  numri  r 


Voice  Interactive  Analysis  System  Study  « 


N61339-78-C-0141  / 


Logicon,  Inc.,  Tactical  &  Training  Systems  Div 
Post  Office  Box  80138 


II  CONTROLLING  OFAIC*  N  >U|  ANO  AOORER 

Naval  Training  Equipment  Center 
Orlando,  Florida  32813 


II  SECURITY  CLASS  lol  ihia  report) 


14  MONI  TONING  AGENCY  NAME  •  *00*  It  ditUtmU  from  Controlling  OttirO) 


Unclassi f ied 


16  DISTRIBUTION  STATEMENT  <ol  Ihlo  Report) 


Approved  for  public  release;  distribution  unlimited 


17  DISTRIBUTION  STATEMENT  (ol  in*  obetrmet  •n(»r*d  In  Block  30,  II  dlltoronl  Irom  Ropoti) 


It  KEY  WO  NOS  (Continue  on  aid*  II  naraaaarv  and  Idonllty  by  block  numbir) 


automatic  speech  recognition,  speech  recognition  Bystem  evaluation 
reference  data,  feature  selection,  statistical  analysis  of  speech 


20  ABSTRACT  f  Continue  an  rsvsns  aid*  II  nocooofy  and  Identity  by  block  numbor) 


EDITION  OP  1  NOV  ••  IS  OBSOLETE 


UNCLASS  I F 1 EP 


ItCUAITV  CLARIFICATION  or  THIS  PAGE  !<*»>•"  I'»l 


UNCLASSIFIED 


SCCuaiTv  CLASSIFICATION  OF  THIS  AAOlPFNxi  Ox*  ImwXJ 


20.  Abstract  (cont. ) 

used  in  LISTEN.  It  is  shown  that  interword  timing  and  structural  peculiar 
ities  are  the  two  most  useful  information  sources  for  the  two  speakers 
investigated.  Statistical  models  of  the  information  sources  are  examined 
critically.  The  analyses  reveal  several  ways  to  simplify  and  improve  the 
LISTEN  algorithm.  Users  manuals  for  analysis  programs  and  for  voice  refer 
ence  data  generation  programs  are  provided  as  appendices. 


_ UNCLASSIFIED _ 

StCUNITT  CLASSIFICATION  OF  THIS  A AOCTITAm  Oaf*  Cmararf) 


A  4 


NAVTRAEQUIPCEN  78-C-0141-1 


FOREWORD 


Earlier  efforts  by  LOGICON  to  develop  a  real-time  connected  speech  recog 
nition  system  resulted  in  a  system  for  using  hardware  designed  for  isolated 
word  recognition  (IWR)  but  enhanced  with  connected  speech  recognition  soft¬ 
ware.  This  LISTEN  system  was  reported  in  a  series  of  technical  reports 
referenced  herein. 


The  effort  reported  here  has  developed  two  products  to  enhance  the  use 
of  the  concept  of  using  high  quality  acoustical  hardware,  such  as  used  for 
IWR,  in  conjunction  with  sophisticated  software  for  connected  speech  recog¬ 
nition.  One  product  is  a  set  of  software  for  formation  of  voice  reference 
patterns.  The  second  product  is  a  users'  manual,  included  as  an  appendix 
here,  which  details  the  techniques  required  to  form  reliable  reference  data 


R.  BREAUX,  Ph.D. 
Scientific  Officer 
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SECTION  I 
INTRODUCTION 


PURPOSE 

This  report  documents  the  work  accomplished  and  results  obtained  during 
the  Voice  Interactive  Analysis  System  (V IAS)  study  proiect. 

BACKGROUND 

The  VIAS  study  was  undertaken  as  part  of  a  continuing  effort  to  obtain  a 
capability  for  automatic  recognition  of  connected  speech  which  meets  the  re¬ 
quirements  of  the  Naval  Training  Equipment  Center  ( NAVTRAE £U I PCEN 1  for  appli¬ 
cation  in  training  systems.  It  is  the  natural  outgrowth  of  previous  projects 
which  led  to  the  development  of  Logicon's  Initial  System  for  the  Timely 
Extraction  of  Numbers  'LISTEN),  a  minicomputer  based,  real-time  connected 
speech  recognition  system. 

Projects  which  led  to  the  development  of  LISTEN  in  December  of  1977  did 
not  include  extensive  testing  of  that  system,  with  the  result  that  at  their 
conclusion  the  potential  of  LISTEN  to  support  naval  training  applications  was 
not  unambiguously  demonstrated.  Good  speech  recognition  accuracy  had  been  ob¬ 
tained  for  one  speaker  (MWG) ,  and  poor  but  ambiguous  test  results  were  obtained 
for  another  speaker  (BRO) ,  apparently  duo  to  equipment  problems  or  anomalous 
changes  in  the  second  speaker's  voice. 

At  the  termination  of  LISTEN'S  development  it  was  also  very  difficult  to 
generate  the  voice  reference  data  needed  to  use  the  system  with  a  new  speaker 
or  a  new  vocabulary.  LISTEN  relies  heavily  on  processing  a  large  sample  of 
voice  data  in  order  to  produce  a  large  amount  of  structural  and  statistical 
data  descriptive  of  the  speaker's  voice,  with  these  data  in  a  form  suitable  to 
support  real-time  connected  speech  recognition.  The  voice  sample  processing 
programs  left  after  developing  LISTEN  were  mostly  dual  purpose  programs, 
serving  to  support  both  research  into  the  nature  of  the  voice  data,  and  the 
extraction  of  voice  parameters  once  thoeo  characteristics  with  promise  for 
recognition  had  been  identified.  The  processes  used  included  minicomputer  pro¬ 
grams,  programmable  alculator  procedures,  manual  graphing  and  manual  calcu¬ 
lations.  Upwards  of  forty  hours  of  both  minicomputer  and  manual  data  proc¬ 
essing  were  required  to  develop  the  voice  reference  data. 

The  VIAS  study  was  thus  undertaken  with  two  main  purposes:  to  further 
test  and  analyze  LISTEN’s  performance,  and  to  bring  together  the  collection  of 
voice  reference  data  generation  procedures  into  a  coherent  set  of  computer  pro¬ 
grams  which  could  be  delivered  to  the  government.  The  additional  test  and 
analysis  of  LISTEN  was  to  be  based  upon  a  set  of  computer  programs  for  auto¬ 
matically  classi fying  and  gathering  performance  data.  IVo  auxiliary  goals  were 
also  attached  to  the  project.  First  of  these  was  to  transfer  LISTEN  technology 
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from  the  speech  preprocessor  (feature  extractor)  with  which  it  was  originally 
developed  to  a  newer  version  of  that  device,  as  the  previous  model  has  gone 
out  of  production.  Second  was  the  extension  upward  of  LISTEN's  initial  vocab¬ 
ulary  size  of  eleven  words,  as  far  as  could  easily  be  managed  without  major 
software  modification,  toward  thirty  words. 

REPORT  OVERVIEW 

Five  groups  of  tasks  were  identified  in  the  VIAS  Project  Work  Plan  Report, 
appropriate  to  the  project  goals  just  described.  The  relationship  of  each 
individual  task  to  this  report  is  described  below. 

TASK  GROUP  1  —  TECHNOLOGY  TRANSFER.  This  group  included  four  tasks  address¬ 
ing  the  problem  of  transferring  LISTEN  technology  from  the  Threshold  Tech¬ 
nology  Model  VIP- 100  speech  preprocessor  to  its  replacement.  Model  TTI-500. 
Tasks  la  and  1b  entailed  gathering  speech  data  for  a  single  speaker,  and  using 
the  previously  developed  computer  program  G2EC  to  discover  structure  in  those 
data.  These  tasks  are  not  described  in  detail,  as  their  purpose  was  to  pro¬ 
vide  the  data  for  tasks  1c  and  Id.  Task  1c  was  a  major  analytic  task,  direct¬ 
ed  toward  determining  which  acoustic  features  extracted  by  the  TTI-500 
preprocessor  are  most  useful  for  recognition.  This  analysis  is  described 
extensively  in  Section  IV.  Task  Id,  directed  toward  verification  of  the  fea¬ 
ture  selection,  is  also  reported  in  that  section. 

TASK  GROUP  2  -  DEVELOP  THE  VOICE  DATA  GENERATION  SYSTEM  (VDGS) .  The  four 
tasks  in  this  group  brought  together  the  various  procedures  used  for  generating 
voice  reference  data  to  support  real-time  speech  recognition  by  LISTEN,  in  the 
form  of  a  unified  body  of  computer  programs  and  a  user's  manual.  Tasks  2a, 

2b  and  2c  entailed  programming  tasks  and  are  not  reported  upon  further.  Their 
end  result  is  the  VDGS,  a  set  of  computer  programs  constituting  a  separate 
deliverable  of  this  project.  The  fourth  task,  2d,  was  to  produce  a  users 
guide  for  the  VDGS,  which  is  introduced  in  Section  II  of  this  report,  and  in¬ 
cluded  in  its  entirety  as  Appendix  A. 

TASK  GROUP  3  —  EXPAND  VOCABULARY.  The  single  task  in  this  group  was  executed 
in  conjunction  with  tasks  2a,  2b  and  2c.  It  entailed  increasing  the  maximum 
number  of  vocabulary  items  which  can  be  accommodated  by  the  individual  pro¬ 
grams  of  the  VDGS,  when  practicable,  toward  thirty  words.  Results  obtained 
are  discussed  in  Section  II,  in  connection  with  the  VDGS. 

TASK  GROUP  4  -  DEVELOP  PERFORMANCE  ANALYSIS  SUBSYSTEM  (PASS) .  The  four  tasks 
in  this  group  entailed  the  design,  implementation  and  application  of  a  new  set 
of  computer  programs  for  collecting  and  organizing  data  about  LISTEN's  recog¬ 
nition  performance.  Also,  the  initial  task,  4a,  was  directed  toward  converting 
the  real-time  recognition  components  of  LISTEN  (the  programs  LTRGEN,  MEX  and 
MINT)  to  opeate  in  a  new  computer  (Data  General  S-130)  and  speech  preprocessor 
(TTI-500)  environment.  The  programming  tasks,  4a,  4b  and  4c  are  not  described 
further,  as  their  end  result  is  the  set  of  programs  comprising  PASS,  a  separate 
deliverable.  The  programs  are,  however,  introduced  in  Section  III  of  this 
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report,  and  a  Users  Guide  for  those  programs  is  included  as  Appendix  B.  Task 
4d,  the  application  of  some  elements  of  the  PASS  to  automatically  classify  rec¬ 
ognition  errors  committed  by  LISTEN,  is  described  ’in  Section  IV. 

TASK  GROUP  5  -  CRITICALLY  EXAMINE  INFORMATION  SOURCE  MODELS .  The  four  tasks 
in  this  group  were  directed  toward  a  detailed  examination  of  the  strengths  and 
weaknesses  of  LISTEN ,  by  determining  the  relative  importance  of  the  various 
information  sources  used  in  that  system  to  achieve  recognition.  As  these  were 
all  analytical  tasks,  they  are  discussed  extensively  in  Section  IV. 

KNOWLEDGE  OF  LISTEN  ASSUMED.  As  LISTEN  is  a  complex  system,  based  on  some 
unique  approaches  to  obtaining  automatic  recognition  of  connected  speech,  this 
report  would  become  excessively  long  if  the  principles  and  details  of  operation 
of  LISTEN  were  described  in  a  self-sufficient  way  here.  The  remainder  of  this 
report  is  therefore  written  assuming  the  reader  has  an  understanding  of  LISTEN, 
to  a  level  of  detail  easily  accessible  in  the  final  reports  of  the  projects 
which  led  to  its  development.  For  convenience,  these  reports  are  identified 
be  low . 

a .  Use  of  Computer  Speech  Understanding  in  Training:  A  Preliminary  In¬ 
vestigation  of  a  Limited  Continuous  Speech  Recognition  Capability;  Technical 
Report  NAVTRAEQUIPCEN  74-C-0048-2;  Lou  icon,  Inc.;  June  1977. 

b.  LISTEN:  A  System  for  Recognizing  Connected  Speech  Over  Small,  Fixed 
Vocabularies,  In  Real  Time;  Report  NAVTRAEQUIPCEN  77-C-0096-1;  Logicon,  Inc.; 
April,  19  7{J . 
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SECTION  II 

THE  VOICE  DATA  GENERATION  SYSTEM  (VOGS) 


DESCRIPTION 

The  VDGS  consists  of  a  collection  of  computer  programs  for  collecting  and 
processing  voice  data  to  generate  the  voice  reference  data  necessarv  to  recog¬ 
nize  connected  speech  in  real  time  with  LISTEN.  The  end  product  of  these  pro¬ 
grams  is  a  larqe  set  of  data  in  the  format  of  a  standard  Data  General  data  'ile, 
called  the  MIND  file. 

The  twenty-nine  programs  comprising  the  VDGS  are  written  in  FORTRAN  IV, 
FORTRAN  V  and  Data  General  Assembly  Languages.  "he  nrograms  are  capable  of,  and 
intended  for,  use  on  a  Data  General  S-130  minicomputer  equipped  with  at  least 
32K  words  of  memory,  a  10-meqabyte  disc,  the  RPOP  operating  system  and  standard 
peripherals.  They  may,  however,  be  recompiled  for  execution  on  other  Data 
General  minicomputers,  such  as  the  Nova  3. 

Appendix  A  is  a  Users  Manual  for  the  VDGS.  It  contains  instructions  where¬ 
by  a  qualified  speech  research  technician,  familiar  with  the  principles  and 
details  of  LISTEN'S  operation,  can  collect  speech  data  ( qiven  the  necessary 
equipment)  and  produce  a  MIND  file  for  use  with  LISTEN. 

The  VDGS  contains  all  programs  necessary  to  collect  speech  data  and  pro¬ 
duce  a  MIND  file.  Of  the  twenty-nine  proqrams,  twenty-four  must  be  used  in  this 
process.  The  remaining  five  programs  are  often  useful,  but  in  general  are  not 
needed  to  produce  MIND  files.  All  manaual  and  extra-computer  procedures 
required  to  generate  voice  reference  data  prior  to  this  project  have  been  auto¬ 
mated  and  implemented  as  programs  in  the  VDGS.  However,  human  surveillance  and 
occasional  modification  of  the  generated  data  are  essential  if  the  recognition 
performance  of  LISTEN  is  to  be  optimized.  The  VDGS  therefore  exists  in  two 
forms:  as  a  collection  of  independent  proqrams  for  individual  execution,  and  as 

a  "pushbutton"  system  reciuiring  a  minimum  amount  of  human  intervention,  known  as 
CHAINMIND. 

CHAINMIND  consists  of  three  segments;  EXTRACT,  GENTL  and  MAKEMIND.  EXTRACT 
is  a  program  which  facilitiates  gathering  soeech  data  samples  in  a  format  suit¬ 
able  for  use  by  the  remainderof  VDGS.  It  includes  prompting  of  the  speaker  via 
the  CRT  display,  with  utterance  contents  taken  from  files  provided.  Since  the 
voice  data  are  usually  taken  over  several  separate  recording  sessions,  (perhaps 
over  several  days),  there  is  a  natural  division  of  the  MIND  file  generation 
process  at  that  point  where  all  necessary  speech  data  have  been  recorded  on 
disk,  and  the  voice  data  processing  can  begin. 

Another  reason  the  EXTRACT  process  is  kept  separate  from  the  remainder  of 
the  CHAINMIND  version  of  VDGS  is  that  a  decision  must  be  made  at  that  point  with 
regard  to  separating  the  collected  voice  samples  (which  consist  mostly  of 
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multiple  word  utterances)  into  individual  vocabulary  items.  This  car.  be  done 
either  automatically  or  manually.  However,  initial  results  using  the  auto¬ 
matically  generated  individual  vocabulary  examples  indicates  that  reasonable 
recognition  results  cannot  be  obtained  in  this  way.  (See  Section  IV  for  spe¬ 
cifics  about  Example  Set  generation.)  The  User's  Manual  and  the  VDGS  contain 
instructions  and  aids  for  generating  vocabulary  item  examples  manually. 

The  second  segment  of  the  CHAINMIND  version  of  the  VDGS,  GENTL,  consists 
of  programs  which  culminate  in  the  generation  of  Transition  Letter  Sets  for 
each  vocabulary  item.  Although  this  process  needs  no  human  intervention,  the 
Transition  Letter  Sets  are  so  fundamental  to  the  successful  operation  of  LISTEN 
that  prudence  dictates  that  they  should  be  examined  and  in  some  cases  modified 
before  continuing  the  MIND  file  generation  process.  This  is  particularly  true 
since  the  method  used  to  generate  Transition  Letter  Sets  (the  algorithm 
GENRLIZ  in  the  program  GZEC)  is  heuristic  in  nature  and  subject  to  the  influ¬ 
ence  of  extraneous  details,  such  as  the  order  in  which  speech  samples  are  pre¬ 
sented  to  it. 

The  third  segment  of  CHAINMIND  contains  the  majority  of  the  programs  and 
requires  the  majority  of  processing  time.  Here  too,  prudence  dictates  human 
surveillance  of  every  step  of  the  process  if  LISTEN’S  performance  is  to  be  op¬ 
timized.  Critical  points  at  which  intervention  may  be  required  cannot  be 
identified  at  this  time,  as  onlv  a  small  number  of  speakers'  data  have  been 
processed.  The  Users  Manual  contains  some  suggestions  and  remarks  which  may 
be  helpful  in  identifying  anomalies. 

For  reasons  mentioned  above ,  it  is  recommended  that  the  VDGS  be  used  as 
a  collection  of  individual  program  elements  in  accordance  with  the  Users 
Manual,  with  careful  scrutiny  of  results  at  every  step. 

VOCABULARY  SIZE 

LISTEN  was  initially  developed  for  an  eleven-word  vocabulary,  and  many 
of  the  programs  now  included  in  the  VDGS  were  developed  to  operate  with  about 
that  many  vocabulary  items.  Under  Task  3a,  the  vocabulary  capacity  of  many  of 
these  programs  has  been  extended  toward  thirty,  and  programs  developed  during 
this  project  for  inclusion  in  the  VDGS  have,  as  far  as  possible,  been  con¬ 
structed  to  accommodate  the  larger  vocabulary. 

The  tabulation  uelow  gives  the  program  name  and  the  vocabulary  size 
capability  of  the  individual  programs  in  the  VDGS,  as  delivered. 


EXTRACT 

any 

INVERT 

30 

MUTE 

30 

ESG 

13 

CROAK 

15 

GLOVE 

30 

GZEC 

13 

REVEX 

13 

TAILOR 

30 

RESCUE 

30 

ADDER 

13 

BUILDER 

any 

SIGH 

30 

AVRAJ 

13 

DEALER 

13 

LOOPER 

13 

CRAP 

13 

PHEW 

30 

REVEXA 

13 

GAPSTER 

1  3 

GASP 

15 

RVDIT 

13 

SORTA 

any 

ESDIT 

1 1 

COVERT 

15 

SORTB 

any 

ESGDIT 

13 

11 
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Extending  the  vocabulary  sl2e  capability  of  the  programs  in  VIX'.S  which 
are  not  yet  capable  of  handling  thirty  vocabulary  items  would  require  more 
or  less  extensive  modification  of  those  programs.  Doing  so  during  this  proj- 
ect  was  judged  an  inappropriate  use  of  project  resources,  as  discussed  in  the 
Work  Plan  Keport.  In  this  connection,  it  should  be  noted  that  two  of  the 
throe  programs  needed  for  real-time  recognition  (MEX  and  MINT)  are  limited  to 
thirteen  vocabulary  items,  and  modification  of  one  of  them  (MEX)  to  accommo¬ 
date  a  larger  vocabulary  would  require  several  labor  months  of  effort.  The 
difficulty  of  such  an  extension  stems  from  the  complexity  of  the  MEX  data 
structure,  not  from  any  inherent  limitation  of  the  recognition  algorithm. 
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SECTION  111 

THE  PERFORMANCE  ANA1YSIS  SUBSYSTEM  (PASS) 


The  PASS  consists  of  three  computer  programs  for  collecting,  processing 
.Mid  plotting  data  useful  in  analysing  many  aspects  of  LISTEN' s  pei  f  ormanve . 
These  programs  ate  supported  bv  a  special  version  of  LISTEN  in  which  the  MFX 
program  produces  data  files  in  tlie  format  required  lot  processing  by  the  PASS 
program  UKlMINT.  The  othet  t  wo  PASS  programs  (STATSHM  and  1.1CVAT)  operate  on 
data  provided  by  BRIMINT. 

The  programs  in  the  PASS  operate  in  the  same  environment  as  LISTEN  and  the 

VTV.S . 

Appendix  1>  is  a  Usets  Manual  for  the  PASS.  It  contains  instructions  tot 
using  these  programs  for  extracting  and  processing  data  from  files  produced  bv 
LISTEN.  Sect  ion  IV  of  this  teport  describes  several  analytical  i  lives f tgat ions 
which  wete  based  on  data  derived  and  processed  bv  the  PASS.  Those  analyses 
are  thus  examples  of  the  varied  possible  uses  ot  data  generated  bv  the  PASS . 

Specific  information  elements  developed  by  programs  in  PASS  are  discussed 
below. 

HIC.MINT.  This  program  provides  the  following  data: 

a.  An  annotated  listing  of  the  entire  MINI'  file. 

b.  Pate  and  identification  of  the  MKX-generated  file  used  tv’  produce  the 
following  data  items  for  each  utterance  in  that  file. 

c.  Compressed  speech  data  file  identifier. 

d.  Pot.ential  revo  init  ions  detected  bv  Ml'X ,  with  the  following  data  foi 
each  potential  recognition. 

Ill  Machine  type 

(-1  Vocabulary  item 

1^1  T-state  counter  statistic  OT 

(41  l, -state  counter  statistic  t,'l. 

lr’)  Start  time 

(el  Recognition  time 

i’)  Associated  vocabulary  items  and  fotms 
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(8)  a  priori  cost 

(9)  Violation  category  cost 

(10)  v>T  cost 

(11)  QIi  cost 

(12)  Total  cost  reported  by  MEX 

(13)  Association  cost 

(14)  Total  cost  assigned  to  the  potential  recognition  in  MINT 
(1r>)  Identification  of  optimal  predecessor 

(16)  Interword  gap  cost  to  optimal  predecessor 

(17)  Total  cost  from  this  node  upward  along  optimal  path. 

e.  Costs  for  all  interword  gaps  between  potential  predecessors. 

f.  Identification  of  the  ten  lowest  cost  paths  through  the  graph  of  the 
utterance,  and  for  each: 

(1)  whether  correct  or  incorrect 

(2)  total  cost 

(3)  vocabulary  items 

(4)  nodes. 

g.  Vocabulary  items  actually  sjxiken. 

h.  For  the  entire  file  of  utterances,  the  number  correctly  and  the  num¬ 
ber  incorrectly  recognized. 

STATSUM.  This  program  provides  t.he  following  data: 

a.  An  annotated  listing  of  the  entire  MIND  file. 


b.  For  each  utterance  processed  by  BIGMINT,  the  index  of  the  utterance 
within  the  MNSFT,  identifier  of  the  compressed  speech  data  file,  and  what  was 
acutally  spoken. 
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c.  For  each  of  the  ten  best  paths  through  the  graph  of  the  utterance,  in 
increasing  order  of  total  path  cost. 


(1) 

whether  correct  or  incorrect 

(2) 

total  a  priori  cost 

(3) 

total  violation  cost 

(4) 

total  QT  cost 

(5) 

total  QL  cost 

(6) 

total  of  costs  reported  by  MEX  for 

all 

nodes 

of 

the 

path 

(7) 

total  association  cost 

(8) 

total  of  costs  assigned  by  MINT  to 

all 

nodes 

of 

the 

path 

(9) 

initial  delay  cost 

(10) 

total  interword  gap  cost 

(11) 

final  delay  cost 

(12) 

total  interword  timing  cost 

(13) 

total  of  all  costs  for  the  path 

(14) 

nodes  of  the  path. 

d.  Category  and  type  of  the  recognition  problem  posed  by  this  utterance 
(as  defined  in  Section  IV) . 

e.  The  difference  in  all  costs  listed  in  c.  above,  between  all  incorrect 
paths  and  the  best  correct  path,  or  an  indication  that  no  best  path  exists. 

LICVAT.  This  program  provides  the  information  listed  below.  Several  of  the 
quantities  mentioned  are  defined  in  Section  IV. 

a.  An  enumeration  of  utterances  in  Category  0,  with  identification  of 
the  compressed  speech  data  file  and  the  index  of  the  utterance  within  the 
MNSET. 

b.  The  enumeration  of  utterances  in  Category  1,  of  type  other  than  (0,1), 
(1,0)  or  (1,1).  For  each  utterance  the  following  data  are  given: 

(1)  compressed  speech  data  file  identifier 

(2)  utterance  index  within  its  MNSET 
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(3)  Whether  it  was  correctly  or  incorrectly  recognized 

<4)  all  costs  enumerated  in  item  c.  for  the  program  STATSUM,  for  the 

best  path 

(5)  utterance  type. 

c.  An  enumeration  of  utterances  in  Categories  2  and  3,  with  compressed 
speech  data  file  identifier  and  utterance  index  within  its  MNSET 

d.  An  enumeration  of  utterances  in  Category  1  in  which  the  best  incor¬ 
rect  path  is  a  direct  start-to-end  node  connection  (corresponding  to  no  spo¬ 
ken  word),  with  compressed  speech  data  file  identifier  and  utterance  index 
within  its  MNSET. 

e.  A  list  of  all  utterances  in  Cateqory  1,  ordered  on  the  basis  of  the 
difference  in  costs  of  various  kinds  between  the  best  incorrect  and  the  best 
correct  path.  (The  M  value  defined  in  Section  IV) .  For  each  utterance  the 
following  data  are  given: 

(1)  compressed  speech  data  file  identifier 

(2)  utterance  index  within  its  MNSET 

(3)  cost  difference  (M  value) 

(4)  whether  correctly  or  incorrectly  identified 

These  ordered  lists  are  generated  for  each  of  the  cost  contributions  mentioned 
in  connection  with  program  STATSUM,  item  c. 

f.  A  computer  generated  plot  of  the  cumulative  distribution  of  M  values, 
for  all  utterances  and  for  incorrectly  recognized  utterances  only.  Histogram 
data  are  also  given  for  M  increments  of  10,  for  all  utterances  and  incorrectly 
recognized  utterances  only. 

g.  All  data  described  in  e.  and  f.  above,  but  restricted  to  utterances 
in  Cateqory  1  of  the  following  types: 

(1)  (0,1) 

(2)  (1,0) 

(3)  (1,1) 

h.  For  all  real  recognitions ,  counts  of  the  number  of  associated  recog¬ 
nitions  of  each  vocabulary  item  and  form,  and  the  total  number  of  associated 
recognitions,  by  vocabulary  item  and  form. 
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i.  As  in  h. ,  but  for  all  artifactual  recognitions. 

j.  For  all  real  recognitions,  counts  of  occurrences  of  each  violation 
category,  and  the  total  overall  violation  categories,  by  vocabulary  item  and 
form.  Also,  the  total  number  of  real  recognitions  (overall  vocabulary  item) 
of  each  violation  category. 

k.  As  in  i.  above ,  but  for  all  artifactual  recognitions. 

l.  For  all  real  recognitions,  a  computer-generated  plot  of  the  cumula¬ 
tive  distribution  of  the  QL  linearizing  function,  f,  with  histogram  data. 

m.  As  in  1.  above,  but  for  all  artifactual  recognitions. 

n.  The  following  data  about  initial  delays,  categorized  by  real  vice 
artifactual  recognition  and  vocabulary  item  and  form: 

(1)  total  number  of  recognitions  in  the  category 

(2)  number  of  zero  delay  values 

(3)  fraction  of  cases  which  were  zero 

(4)  the  average  of  the  non-zero  initial  delays 

O.  As  in  n.  above,  but  for  final  delays. 

p.  A  computer  generated  plot  of  the  cumulative  distribution  of  the  inter¬ 
word  gap  normalizing  function,  f,  for  all  interword  gaps  between  contiguous 
real  recognitions. 

q.  As  in  p.  above,  but  for  all  interword  gaps  between  recognitions  and 
their  potential  predecessors,  other  than  contiguous  teal  recognitions. 
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SECTION  IV 
ANALYSES 

The  analyses  performed  in  the  VIAS  study  are  reported  in  this  section. 
These  analyses  were  major  parts  of  Task  Groups  1  and  5  and  a  minor  part  of 
Task  Group  4.  The  analyses  fall  naturally  into  two  categories.  The  first 
category  (Task  Group  1)  is  concerned  with  transferring  the  connected  speech 
recognition  capability  developed  with  the  VIP- 100  speech  preprocessor  to  its 
successor,  the  TTI-500.  The  second  category  (Task  Groups  4  and  5)  is  con¬ 
cerned  with  a  critical  examination  of  the  LISTEN  speech  recognition  algorithm, 
to  determine  its  strengths  and  weaknesses,  in  hopes  of  discovering  fruitful 
approaches  to  improving  its  performance  and  easing  the  task  of  applying  it  in 
automated  training  systems. 

Underlying  each  type  of  analysis  were  several  steps,  starting  with  deter¬ 
mination  of  the  types  of  data  needed,  design  of  an  algorithm  for  extracting 
the  data,  implementation  of  a  program  or  program  segment  for  extracting  the 
data  (as  part  of  the  Performance  Analysis  Subsystem,  PASS)  and,  finally,  ex¬ 
tracting  and  analyzing  the  data  thus  obtained.  In  the  description  of  the 
analyses  presented  below,  only  the  nature  of  the  data  used,  the  data  itself 
and  the  analysis  of  the  data  are  discussed.  Designing,  implementing  and  exer¬ 
cising  the  relevant  portion  of  the  PASS  are  not  discussed,  although  those 
efforts  consumed  a  significant  portion  of  project  resources.  Instructions  for 
using  the  PASS  to  develop  data  of  the  type  presented  in  connection  with  the 
following  analyses  are  given  in  Appendix  B,  the  PASS  Users  Manual. 

The  remainder  of  this  section  is  divided  into  five  parts  addressing: 

a.  The  experimental  bases  used  in  the  analyses. 

b.  The  transfer  of  technology  (Task  Group  1). 

c.  The  contribution  of  each  information  source  to  recognition  (Task  5d) . 

d.  The  analysis  of  recognition  errors  (Tasks  4c  and  5d) . 

e.  The  critical  examination  of  information  source  models  (Tasks  5a  and 

5b) . 

EXPERIMENTAL  BASES  FOR  THE  ANALYSES 

Voice  data  were  collected,  and  MIND  files  were  created,  for  two  new  speak¬ 
ers  (LHN,  JEP)  during  the  course  of  this  project.  Voice  data  for  testing  were 
also  collected  for  these  two  speakers,  and  LISTEN  was  exercised  on  these  data. 
Finally,  the  performance  analysis  programs  in  the  TASS  were  exercised  on  out¬ 
put  obtained  from  LISTEN  for  speakers  MWG,  LGN  and  JEP.  In  this  way  the  pro¬ 

grams  of  the  VDGS  and  the  PASS  were  validated,  and  data  were  generated  for 
the  analysis  tasks  of  the  project. 
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Speech  data  were  collected  in  several  (five  to  ten)  sessions  over  a  few 
days.  After  collecting  all  of  the  data  from  each  speaker,  it  was  divided  into 
three  equal  parts  called  Training,  Interim  Test  and  Test  data.  These  terms 
were  inherited  from  the  LISTEN  development  project  wherein  the  second  set  of 
data  was  used  to  test  some  initial  concepts.  In  this  project  the  Interim  Test 
data  were  simply  used  to  extract  certain  speech  character istic  data  not  obtain¬ 
able  from  Training  data. 

Each  set  of  data  consisted  of  six  "Magic  Number  Sets".  Each  of  these  is 
fifty-five  utterances  of  from  one  to  four  words,  arranged  in  a  format  which 
makes  the  numbers  appear  to  the  speaker  to  be  quite  random.  They  are  actually 
a  carefully  balanced  set  of  vocabulary  items,  combined  in  such  a  way  that  each 
digit  occurs  an  equal  number  of  times,  and  the  word  "point"  nearly  as  often,  and 
so  that  each  transition  between  vocabulary  items  appears  exactly  once. 

Each  major  data  set  (Training,  Interim  Test  and  Test)  thus  consisted  of 
three  hundred  thirty  utterances  containing  one  thousand  fifty  words.  Train¬ 
ing  data  were  used  to  generate  structural  characteristics  of  each  vocabulary 
item  (Transition  Letter  Sets  and  Loop  Letter  Sets)  ,  and  some  statistical  prop¬ 
erties  of  the  voices  were  extracted  from  Interim  Test  data.  Test  data  were 
used  only  for  testing  purposes.  Most  results  in  the  following  are  therefore 
based  on  Test  data.  The  exception  is  the  investigation  of  statistical  models, 
where  data  are  compared  for  Interim  Test  data  and  Test  data,  to  determine  the 
validity  of  certain  statistical  assumptions. 

The  voice  data  taken  from  the  two  new  speakers  were  processed  differently, 
with  qualitatively  different  recognition  performance  results,  as  is  described 
in  the  following  paragraphs.  New  data  for  only  one  speaker  (LUN)  were  there¬ 
fore  usable  in  the  detailed  analyses  of  LISTEN  performance. 

An  experimental  variable  of  considerble  interest  in  connection  with  the 
VDGS  was  also  investigated  in  this  study.  This  variable,  Example  Set  Genera¬ 
tion,  relates  to  the  way  in  which  speech  data  are  separated  into  sets  of 
examples  of  individual  vocabulary  items.  This  step  is  necessary  for  the  gen¬ 
eration  of  Transition  Letter  Sets  by  the  program  GZEC.  Two  approaches  have 
been  used  to  segment  the  samples  of  connected  speech.  Originally,  computer 
printouts  of  speech  preprocessor  data  were  scanned  by  eye,  and  segments  within 
each  utterance  which  contained  each  individual  vocabulary  item  were  identified 
visually  and  recorded  manually.  These  segments  were  selected  to  contain  the 
vocabulary  item  with  high  confidence,  but  with  as  little  additional  material 
as  possible,  in  the  judgement  of  the  person  marking  the  data.  This  remains 
the  recommended  way  to  produce  the  needed  example  sets. 

As  described  in  Reference  1,  an  automatic  method  of  generating  sets 
of  individual  vocabulary  samples  has  also  been  developed.  This  pro¬ 
cedure,  embodied  in  the  program  ESC  (Example  Space  Generator)  applies 
statistics  derived  from  MWG's  voice  to  excise  segments  of  a  multiword 
utterance  which  contain  individual  vocabulary  items  with  high  confidence. 

This,  of  course,  entails  a  tradeoff  between  taking  a  large  segment 
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including  extraneous  material,  ami  taking  a  smaller  segment  witn  attendant  in¬ 
creased  risk  ot  excluding  some  portion  of  the  spoken  word.  Since  the  nature 
of  GZEC  makes  it  much  more  sensitive  to  the  deletion  of  parts  of  words  than  to 
the  inclusion  of  extraneous  material,  the  safety  factors  used  in  ESG  (to  accom¬ 
modate  statistical  variability  in  articulation  and  still  extract  unclipped 
vocabulary  item  examples)  are  quite  high.  Not  surprisingly,  the  safety  fac¬ 
tors  used  in  ESG  make  the  word  length  statistics  derived  from  MWG's  voice  data 
apply  to  other  speakers  as  well.  Using  ESG  to  generate  vocabulary  item  samples 
is  therefore  an  alternative  to  doing  so  manually  when  the  VDGS  is  applied  to  a 
new  speaker.  This  is  the  additional  experimental  variable  which  was  investi¬ 
gated  in  this  project,  by  using  the  manual  procedure  for  LHN's  voice  data  and 
the  automatic  procedure  embodied  in  ESG  for  JEP's  voice  data. 

Although  the  transition  letter  sets  generated  by  these  methods  do  not  appear 
qualitatively  different  (see  Figures  1  and  2),  the  recognition  performance  ob¬ 
tained  using  ESG  was  significantly  inferior  to  that  obtained  by  using  the  man¬ 
ual  procedure,  as  the  following  data  show: 

Example 


Speech 

Generation 

%  Utterances 

Speaker 

Data 

Method 

Correct 

MWG 

Interim  Test 

Manual 

94 

MWG 

Test 

Manua 1 

86 

LHN 

Interim  Test 

Manual 

74 

LHN 

Test 

Manual 

70 

JEP 

Interim  Test 

Automated  (ESG) 

42 

JEP 

Test 

Automated  (ESG) 

44 

Detailed  examination  of  LISTEN'S  performance  for  speaker  JEP  shows  that 
the  Transition  Letter  Sets  are  not  effective;  MEX  very  frequently  does  not 
detect  a  word  actually  spoken  as  potentially  present  in  the  utterance.  This 
occurs  relatively  infrequently  for  MWG  and  LHN .  The  difference  in  Transition 
Letter  Sets  is  presumably  due  to  the  extraneous  material  in  the  set  of  examples 
from  which  the  JEP  Transition  Letter  Sets  were  derived. 

The  failure  of  MEX  to  detect  the  potential  occurrence  of  an  actually  spo¬ 
ken  word  also  seriously  perturbs  the  voice  reference  data  extraction  process, 
which  contributes  to  the  poor  performance.  (Notice  that  recognition  results 
for  JEP  are  better  on  Test  data  than  on  Interim  Test  data,  even  though  statis¬ 
tical  data  are  extracted  from  the  former.)  For  these  reasons,  the  data  for 
JEP,  while  contributing  a  significant  result  to  the  project  as  a  whole,  were 
not  used  in  the  detailed  examination  of  LISTEN  as  its  performance  with  bad  ref¬ 
erence  data  is  not  indicative  of  its  true  potential. 
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TLS  No. 

n 

:ero" 

TLS  No. 

"ONE" 

TLS  No. 

TWO" 

1 

0000 

0  0000 

1 

OOOOOOOOO  0; 

1 

00 

0  OOO  00; 

2 

10000 

00  0000 

2 

1 

000000000  0; 

2 

0 

000000; 

3 

0000 

00000000 

3 

1  11000000000  0; 

3 

OOO 

OOOOO  00; 

4 

000 

OOO  00 

4 

01 

100000000  0; 

4 

0000 

0  0  00; 

5 

1  00  1 

000  OOO 

5 

01  0 

1  0000000  0; 

5 

OOO 

00000000 ; 

b 

1  00  1 

OOO  OOO 

6 

0  0 

1  0  OOOOO  0; 

6 

OOO 

OOOOO  00; 

7 

1  00  1 

0000  OOO 

7 

0  0 

OOOOO  0; 

7 

OOO 

OOO  OOO; 

8 

1  00  1 

0000  0  0 

8 

0  0 

OOOOOOOOO ; 

8 

OOO 

0000  00, 

9 

1  00  1 

0000  0  0 

9 

0  00 

00  000000; 

9 

00 

0000000 ; 

10 

0  0 

OOOOOOOOO ; 

10 

OOO 

OOOOOOOOO; 

1  1 

0 

0  OOO  0  ; 

TLS  No. 

"THREE” 

TLS  No. 

"FOUR" 

TLS  No. 

"FIVE" 

\ 

00 

0  OOO 

1 

1  0000000000  0, 

1 

00000000  0 

2 

00 

0  0  0  00 

2 

1  0000000000  0; 

2 

1 

00000000 10 

3 

000 

0  0000 

3 

1  01 

OOOOOOOOO  0; 

3 

01  0  1 

000000010 

4 

0000 

0  0000 

4 

1  0 

00000000  0; 

4 

0110 

0000000  0 

5 

0000 

0  0  00 

5 

1  0 

1  00000000  0; 

5 

01100 

1  000000  0 

6 

00 

00000000 

6 

0 

0000000  0; 

6 

1  00 

1  0000  0  0 

7 

00 

0000  OOO 

7 

0 

OOO  0  0; 

7 

1  00 

OOO  0 

8 

00 

0000000 

8 

0 

OOO  0  0; 

8 

00 

0000000 

9 

0 

OOO  0 

9 

00 

OOO  0; 

9 

OOO 

00000000 

10 

0 

OOO  0  0 

10 

00 

OOOOO  0; 

1 1 

0  OOO  00; 

TLS  No. 

N 

SIX" 

TLS  No. 

"SEVEN" 

TLS  No. 

fl 

EIGHT" 

1 

000000001 1 10000 

1 

0000000001  10000 

2 

0000 

00  0000 

2 

OOO 

00  0000 

3 

OOO 

00  OOO 

3 

OOO 

00  0 

4 

1  00 

0000  00 

4 

11  00 

1  0000  0 

3 

1  OOO 

100000  00 

5 

11  00 

1  OOO  0 

6 

1  0000 

100000  00 

6 

1  00 

1  OOO  0  0 

7 

OOOOOO  0000000 

7 

1  0 

0000  0  0 

8 

0000 

00000000 

8 

0  1 

OOOOO  0  0 

9 

0000 

0000 

9 

1  00  1 

0  0000000 

10 

00  00 

0  0000 

10 

00  1 

0000000  0 

1 1 

00  1 

OOOOO  0  0 

12 

0 

OOOOO  0  0 

13 

0 

OOOOOO  0 

14 

0  0 

OOOOOOOOO 

15 

0  00 

OOOOOOOO 

1  ;  OOO  0000  0 

2  ;  0000  00000  00 

3  ;  00000  00000  00 

4  ;  00000  0000  00 

■>  ;  000000  0000000 

b  ;  00000000  000000 

7  ;  000  00  0000 


TLS  No.  "NINE" 


TIS  No.  VO  I  NT" 


1 

2 

3 

4 

5 

6 
7 
9 


0  0  000000  0 
01100  1  00000  0, 
01100  1  000000  0 
01  00  1 100000000 
0  00  00  00000 

0  000  0  ooooo 

00  00  0000000 
0000  0000000 


1  ;  1  1  000000000  0 

2  ;  I  1  000000000 10 

5  ;  11  00  0000010 

4  ;01  11  0  OOOOO  0 

5  ;0I  0  110  OOOOO  0 

6  ;0I  00  1  OOOOO  0 

7  ;0  000  100000000 

9  ;  0000  0000000 

9  ;  00  000  0  0000 

10  ;  00  0  00  ooooo 

11  r>  ilOlliWW  «iW> 
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TLS  No. 

n 

ZERO" 

TLS  No. 

"ONE" 

TLS  No. 

M 

TWO" 

1 

ooooo 

001  10000 

1 

1 1 

0000000000  0 

1 

0  OOOO 000000000 , 

2 

ooo 

OOOOO  00 

2 

1 

OOOOOOOO  0 

2 

0000000  oooo , 

3 

1  000 

1  000  0 

3 

3  1  OOOO  0  0 

3 

OOOOOOOO  OOOOO; 

4 

1  000 

1 1  000  0 

4 

0 

000  0 

4 

OOOO 

OOOOO  00 

5 

1  000 

1  0001  0 

3 

0 

000  000 

5 

OOOO 

100000  00 

6 

1  000 

1  000  0  0 

0 

0  00  000 

6 

1  000 

OOOO  00 

7 

1  100  1 

000  010 

7 

000 

000  00 

8 

1 100 

000000010 

8 

000 

00  000000 

9 

1  00  OOOOO  0 

9 

0 

00  0  oooo 

10 

00  0  00  0 

1 1 

00  00  0 

TLS  No. 

"TWEE" 

TLS  No. 

"FOUR" 

TLS  No. 

H 

FIVE" 

1 

oooooooo  ooooo 

1 

1 1 

OOOO  OOOOO  0 

1 

1 

OOOOOOOO 10. 

2 

ooooo  ooooo 

2 

1 

0000000000  0 

2 

Oil 

000000010 

3 

oooo 

000000000 

3 

1 

000000000  0 

3 

01 1 

0 

1000000010 

4 

00 

oooooooo 

4 

1  0 

000000000  0 

4 

on 

0 

1  0000010 

3 

00 

1  oooo  0  0 

5 

0 

OOOOOOOO  0 

5 

1 1 

00 

1  000  0  0 

6 

1  00111  0000  0  0 

6 

0 

000  0  0 

6 

1 1000 

1  000  0  0 

7 

1  00  11  000010  0 

7 

0 

000  0  0 

7 

00 

OOOO  00 

8 

1  000 

1100001  0 

8 

00  0  0 

8 

0 

0 

000000 

9 

1  0000 

1 OOOOO  00 

9 

0  0  0 

9 

0 

0  0  00 

10 

1  0000 

OOOOO  00 

1 1 

000  00  OOOOO  00 

12 

000  0  00 

TlS  No. 

n 

SI  X" 

TLS  No. 

"SEVEN" 

TLS  No. 

#t 

EIGHT" 

1 

000000  1 1 10000 

1 

OOOOOOOO 1 1 10000 

1 

00 

00  0; 

2 

0000000  1  0000 

2 

000000  oooo 

2 

1  000 

OOOO  0; 

3 

0000 

0  00 

3 

1 1 

0001  OOOO  0  0 

3 

1  000 

100000  00; 

4 

000 

0  0  00 

4 

1 1 

00  1  OOOO  0  0 

4 

1  OOOOO  OOOOO  00; 

5 

0000 

100000  00 

5 

1 1 

001  OOOO  0  0 

5 

000 

OOOOOOOO ; 

6 

0000 

OOOOO  00 

6 

11 

011  000  0  0 

6 

oooo 

OOOO  00; 

7 

ooooo 

0000000 

7 

1 1000  1  OOOO  0  0 

7 

0 

000  OOOO  00: 

8 

ooooo 

000000 

8 

1100  000  0  0 

9 

0  00 

oooo 

9 

1  0 

OOOO  000 

10 

0000 

1  oooo 

10 

0000000 

1 1 

0000 

oooo 

TLS  No. 

"NINE" 

TLS  No . 

"POINT" 

1 

0  0  0000  0  0; 

1 

0000000000  0 

2 

1  0  1  OOOO 

)  0; 

2 

1 1 1 10000000000 10 

3 

1  00  1  OOOL 

0; 

3 

1  1100000000010 

4 

01  00  1 10000 

)  0; 

4 

10  1  0000000010 

5 

-  1  0001  1  000  0  0; 

5 

100  0000000  0 

6  ' 

1  000  1  OOOOO  0; 

6 

1  0  1  000 

0  0 

7 

1  000  1  0000000; 

7 

1100  000 

0 

8 

110000  100000 

00; 

8 

10  0  000 

0 

9 

000  oooo 

00; 

9 

1  0  OOOOO  0 

10 

00  0  oooo 

00; 

10 

OOOO  0  000  00 

Figure  2.  Transition  Letter  Sets  (TLS)  for  Vocabulary  Items  "ZERO" 

through  "NINE"  anti  "FOINT"  for  Speaker  JEP,  Generated  from 
Manually  Produced  Examples 
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TRANSFER  OF  TECHNOLOGY 

The  LISTEN  connected  speech  recognition  system  was  developed  using  Thresh¬ 
old  Technology  Corporation's  speech  preprocessor  Model  VIP-100,  which  is  no 
longer  being  produced.  Its  successor,  the  Model  TTI-500  is  based  on  a  similar 
principle  of  operation,  provides  output  which  is  identical  to  that  of  its  pred¬ 
ecessor  in  terms  of  the  electrical  interface  and  digital  format,  and  is  ex¬ 
pected  to  be  available  for  a  considerable  time  into  the  future.  It  is  therefore 
both  feasible  and  desirable  to  bring  LISTEN  into  accomodation  with  the  newer 
version  of  the  speech  preprocessor. 

The  principal  difference  between  the  older  and  newer  preprocessor  is  the 
acoustical  significance  of  some  of  the  speech  features  recognized  by  the  re¬ 
spective  devices.  Only  eight  features  are  common  to  the  two  devices.  In  both 
cases  thirty-two  features  are  determined  to  be  either  present  or  absent  at  a 
nominal  rate  of  500  times  per  second,  and  this  determination  is  encoded  as  two 
sixteen-bit  binary  words,  transmitted  to  the  central  processor  during  detected 
periods  of  speech.  As  LISTEN  was  purposefully  developed  to  discover  and  to 
recognize  patterns  in  a  stream  of  binary  data,  without  recourse  to  the  acoustic 
significance  of  the  data,  very  little  change  in  LISTEN  is  required  to  accom¬ 
modate  the  new  preprocessor.  As  LISTEN  uses  only  sixteen  of  the  thirty-two 
bits,  or  features,  received  from  the  preprocessor  every  two  milliseconds,  the 
only  requirements  to  adapt  LISTEN  to  the  new  preprocessor  are  to  select  which 
sixteen  of  the  available  thirty-two  features  to  use,  and  to  change  the  inter¬ 
face  accordingly.  The  analysis  required  in  support  of  the  transfer  of  LISTEN 
technology  to  the  new  preprocessor  thus  reduces  primarily  to  selecting  the 
features  to  use  and  secondarily  to  verifying  the  selection. 

FEATURE  SET  SELECTION.  The  only  constraint  which  must  be  met  in  selecting  six¬ 
teen  of  the  thirty-two  features  available  is  that  the  long  pause  feature,  LPg , 
which  indicates  the  end  of  an  interval  of  vocalization  must  be  among  them. 

Any  other  fifteen  features  could  be  used  in  conjunction  with  LPg .  The  vocali¬ 
zation  indicator,  LP^ ,  must  be  included  in  the  selected  features  because  it 
is  used  as  the  indicator  for  end  of  utterance  processing  in  LISTEN.  The  prob¬ 
lem  is  thus  reduced  to  selecting  fifteen  features  among  thirty-one  available. 

Using  a  single  feature  several  times,  i.e.,  forming  a  sixteen-bit  com¬ 
puter  word  by  selecting  less  than  fifteen  features  (plus  LP^)  and  setting 
several  bits  equal  to  a  single  feature  indicator,  has  no  utility.  This  is 
because  LISTEN  j.s  ^..nsitive  only  to  the  information  content  of  each  feature 
position,  so  that  the  same  results  would  be  obtained  by  using  a  smaller  num¬ 
ber  of  features,  each  represented  only  once.  Since  adding  different  features 
to  a  pre-existing  set  of  distinct  features  has  the  potential  (at  least)  of 
increasinq  the  available  amount  of  information  about  what  was  spoken,  only 
sets  of  fifteen  distinct  features  need  be  considered. 
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An  "ideal"  method  lor  selecting  the  set  of  features  to  be  used  would  be 
to  directly  evaluate  the  recognition  performance  obtained  with  alternate  sets 
of  features.  Any  other  method  must  be  considered  to  be  indirect,  and  must  be 
based  on  some  assumptions  about  how  the  recognition  performance  would  be  af¬ 
fected  by  different  feature  characteristics.  Direct  evaluation  of  even  a  few 
alternative  feature  sets  is  quite  impractical,  however,  as  more  than  forty 
hours  of  computer  processing  time  is  the  minimum  required  to  evaluate  recog¬ 
nition  performance.  Since  there  are  over  three  hundred  million  subsets  of 
fifteen  items  taken  from  a  set  of  thirty-one  items,  some  method  of  pre¬ 
selecting  a  (very  much)  smaller  collection  of  alternatives  must  be  used  any¬ 
way.  Practical  necessity  therefore  drives  one  to  an  indirect  method  of 
selection. 

One  indirect  method  of  feature  selection  is  to  refer  to  authority.  In 
this  case  the  unquestioned  leading  authorities  on  the  acoustical  significance 
of  features  are  the  personnel  at  the  preprocessor  manufacturing  facility,  where 
the  circuitry  for  extracting  the  available  features  was  developed.  The  manu¬ 
facturer  (Threshold  Technology,  Inc.)  most  cooperatively  suggested  a  set  of 
fifteen  features  which,  in  their  judgement,  would  work  well  in  the  LISTEN  en¬ 
vironment.  Since  LISTEN  is  a  complex  algorithm  which  had  not  been  thoroughly 
tested  at  the  time,  the  manufacturer's  suggested  set  of  features  must  be  re¬ 
garded  as  an  informed  opinion  rather  than  a  definitive  solution  to  the  problem. 
This  opinion  is  based  on  extensive  testing  of  many  different  features.  (Pre¬ 
sumably  in  the  context  of  isolated  word/phrase  recognition,  which,  while  dif¬ 
ferent  in  many  practical  respects  from  connected  speech  recognition,  should 
nevertheless  exhibit  similar  sensitivity  to  the  utility  of  a  feature  for  rec¬ 
ognition.)  The  set  of  features  suggested  by  the  manufacturer  is  the  set 
ult_..iately  selected  for  use  in  LISTEN,  for  reasons  described  in  the  following 
discussion. 

An  attempt  was  made  to  measure  objectively  the  utility  of  each  feature 
for  recognition.  The  approach  used  was  to  posit  several  different  measures 
of  feature  "quality",  obtain  values  for  these  measures  and  analyze  the  re¬ 
sults.  The  measures  posited  were  based  on  plausible  judgements  about  observ¬ 
able  characteristics  of  a  feature  which  carries  a  large  amount  of  information 
which  would  be  useful  for  distinguishing  among  vocabulary  items. 

This  approach  to  evaluating  features  suffers  several  shortcomings,  in 
spite  of  its  intuitive  appeal.  Most  serious  of  these  shortcomings,  perhaps, 
is  the  questionable  nature  of  the  assumption  that  features  can  be  evaluated 
individually.  The  recognition  procedure  used  in  LISTEN  is  based  upon  detecting 
the  simultaneous  presence,  or  absence,  of  several  features  in  the  preprocessor 
output.  It  is  therefore  possible  that  there  is  no  measure  of  effectiveness  of 
individual  features,  and  only  the  effectiveness  of  sets  of  features  can  be  given 
concrete  meaning.  The  actual  situation  is  probably  intermediate  between  the 
inherent  extremes.  That  is,  there  probably  are  indicators  of  individual  fea¬ 
ture  utility  such  that  selecting  those  fifteen  features  with  highest  utility 
produces  an  excellent,  but  not  necessarily  the  best  possible,  choice. 
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Another  difficulty  with  evaluating  quality  measures  of  features  is  a 
practical  one.  Data  must  be  collected  from  a  particular  speaker  or  speakers, 
speaking  phrases  from  a  particular  vocabulary,  raising  the  difficult  question 
as  to  how  valid  the  results  might  be  for  other  speakers  and  other  vocabularies. 
In  the  VIAS  project  the  available  resources  allowed  examining  data  collected 
from  a  single  speaker  (LHN) ,  speaking  only  the  digits.  (The  primary  limita¬ 
tion  here  was  labor  required  to  do  the  analysis  in  a  timely  manner,  as  voice 
data  were  available  from  several  other  speakers.) 

A  third  difficulty  with  basing  feature  selection  on  evaluation  of  some 
intuitively  appealing  measures  of  feature  quality  is  that  the  quality  measures 
themselves  are  entirely  ad  hoc,  as  it  is  not  practical  to  test  the  quality 
measures  for  the  same  reasons  that  it  is  not  practical  to  evaluate  alternative 
selections  of  features. 

On  the  positive  side,  there  is  a  possibility  that  meaningful  individual 
feature  quality  measures  can  be  posited  and  their  evaluation  may  give  some 
clear  indication  of  the  utility  of  at  least  some  features.  The  quality  meas¬ 
ure  approach  was  followed  in  the  hope  that  this  would  be  the  case. 

Quality  Measures.  Six  measures  of  individual  feature  quality  were  posited  and 
evaluted.  One  of  these  (VFO)  is  defined  in  terms  of  the  frequency  of  occur¬ 
rence  of  a  feature  in  various  vocabulary  items.  The  other  five  are  attempts 
to  measure  the  amount  of  reliable  "structure"  —  reliably  occurring  sequences 
of  feature-present/feature-absent  zones  —  which  exist  in  a  large  sample  of 
vocalization  of  a  given  vocabulary  item. 

The  quality  measures  were  evaluted  on  a  data  set  extracted  by  the  TTI-500 
while  the  subject  (LHN)  spoke  various  vocabulary  items  in  connected  combina¬ 
tions.  Individual  vocabulary  items  were  visually  identified  within  computer 
printouts  of  the  features  detected  by  the  preprocessor.  Distributing  the  seg¬ 
mented  connected  speech  data  into  example  sets  for  each  vocabulary  item  pro¬ 
vided  the  data  base  needed  for  evaluating  the  quality  measures. 

In  order  to  evaluate  quality  measures  other  than  the  first,  some  way  to 
recognize  and  extract  reliably  occurring  patterns  of  a  feature's  history  within 
each  vocabulary  item  was  needed.  The  program  GZEC,  incorporating  the  algorithm 
GENRLIZ,  was  used  for  this  purpose.  (This  program  and  algorithm  are  part  of, 
and  described  in  correction  with,  the  VDGS ,  and  in  previous  LCSR  project  re¬ 
ports.)  GZEC  was  utilized  two  times  for  each  vocabulary  item  (in  this  case 
just  the  digits  0-9),  first  operating  on  data  containing  the  manufacturer's 
recommended  set  of  fifteen  features  (hereafter  called  the  Initial  Feature 
Set) ,  and  second  operating  on  data  containing  only  the  other  sixteen  features. 
GZEC  extracted  Transition  Letter  Sets  from  sixty-six  examples  of  each  vocabu¬ 
lary  item.  These  Transition  Letter  Sets  exhibit  the  pattern  with  which  each 
feature  occurs  reliably  in  the  sample  of  sixty-six  vocalizations  of  each 
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vocabulary  The  feature  quality  measures  (other  than  the  tirst)  are  there 

fore  defined  in  terms  of  the  patterns  the  feature  follows,  as  revealed  in  the 
Transition  hotter  Sets  for  each  vocabulary  item. 

Since  the  Transition  Letter  Sets  obtained  for  a  collection  of  utterances 
is  dependent  upon  interaction  between  features,  it  cannot  be  assumed  that  this 
method  of  processing  treats  all  features  identically.  Attempts  to  eliminate 
this  potential  bias  are  frustrated  by  the  fact  that  C.SEC  can  process  at  most 
sixteen  features  in  a  single  run,  and  there  are  hundreds  of  millions  of  ways 
to  select  sixteen  features  from  the  available  thirty-one. 

Definition  of  Feature  Qualify  Measures.  The  individual  feature  quality  meas¬ 
ures  used  in  this  investigation  are: 

a.  Variance  of  Frequency  of  Occurrence  (VFO) .  if  a  feature  occurs  very 
frequently  in  some  vocabulary  items,  about  half  the  time  in  others,  and  very 
infrequently  m  still  others,  that  feature  would  be  useful  foi  distinguishing 
among  some  vocabulary  items.  The  quantity  VFO  measures  the  vocabulary  item 
dependent  variability  of  frequency  of  occurrence  of  a  feature.  It  is  the 
variance,  across  vocabulary  items,  of  the  average  frequency  of  occurrence  of 
the  feature  in  each  vocabulary  item.  It  is  determined  by  the  equation: 

VF0  "  Tvf  S  <pv  '  ur’ 

vt'V 

where  |v|  is  the  number  of  vocabulary  items 

uv  is  the  average  frequency  of  occurrence  of  the  features  in  samples 
of  vocabulary  item  V 

u  is  the  average  of  u v  over  all  vocabulary  items  v 


b.  Frequency  of  Zero  and  One  (FOI).  Each  feature  position  in  the  Tran¬ 
sition  Letter  Sets  indi  sites  the  reliably  occurring  pattern  of  development  of 
that  feature  in  the  word.  Each  Transition  Letter  Set  indicates  that,  at  its 
corresponding  point  in  the  word,  the  feature  is  either  reliably  present  (indi¬ 
cated  by  1),  reliably  absent  (indicated  by  0) ,  or  not  reliably  either  present 
or  absent  (indicated  by  a  blank).  If  a  feature  lias  a  rich  and  reliable  pat¬ 
tern  of  occurrence  and/or  non-occurrence  in  a  word,  then  the  number  of  0’s 
and  I's  for  that  feature  is  largo  compared  to  the  number  of  blanks.  The  aver¬ 
age  fraction  of  Transition  Letter  Set  occurrences  which  are  zero  or  one  is 


*  (T) 

k 


is  1  it  the  feature  m  question  has  value  k  (0  or  1)  m  Tran¬ 
sition  Letter  Set  T,  and  0  otherwise. 
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c.  Vocabulary  Variance  of  Frequency  (WF),  While  a  feature  with  low  fre¬ 
quency  of  required  presence  or  absence  (low  F01)  must  have  marginal  utility  for 
recognition,  variability  over  vocabulary  items  of  the  frequency  of  required 
presence  or  absence  might  indicate  high  vocabulary  item  dependence  of  the  fea¬ 
ture.  Thus  WF  is  defined  to  be  the  variance  over  vocabulary  items  of  the  fre¬ 
quency  with  which  the  feature  is  required  to  be  present  or  absent.  That  is, 

WF  is  the  variance  of  F01  determined  for  each  vocabulary  item. 

WF  =  y^y  V  (F01y  -  FO 1  )  2 

V 

Mv 

where  F01v  “  pjj  ^  i»o<Ti,v>  +  »i  <TilV>l 

i=1 


and  other  quantities  are  defined  in  (b)  above. 


d.  Average  Number  of  Zero  or  One  Zones  (ANZ) .  Each  feature  tends  to 
vary  rather  regularly  within  a  word,  exhibiting  zones  where  the  feature  is 
reliably  present  or  absent,  bordered  by  zones  where  j.ts  occurrence  is  unpre¬ 
dictable.  The  number  of  zones  wherein  the  feature  is  either  reliably  present 
or  absent  is  an  indication  of  structure  as  found  for  the  vocabulary  item  by 
GZEC  • 


where 


ANZ 


"R 


v 

vTv 


*r 

'"V 


| V [  is  the  number  of  vocabulary  items,  and 


Zv  is  the  number  of  zones  of  required  presence  or  absence  of  the  fea¬ 
ture,  as  exhibited  in  the  Transition  Letter  Sets  for  vocabulary  item 
V. 


e.  Average  Number  of  Zero/One  Zone  Reversals  (ANR) .  As  an  indicator  of 
the  richness  or  complexity  of  the  reliably  occurring  pattern  of  a  feature 
within  a  word,  one  can  count  the  number  of  reversals  between  zones  of  required 
absence  and  required  presence  of  the  feature,  ignoring  any  intervening  zones 
where  the  feature  is  not  reliably  present  or  absent.  The  result  is 

MIR --R  \  "v 

vcV 

where  |v|  is  the  number  of  vocabulary  items,  and 

Rv  is  the  number  of  reversals  between  zones  of  required  .absence  and 
required  presence  of  the  feature  or  vice  versa,  in  the  Transition 
Letter  Sets  for  vocabulary  item  V. 
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f.  Mean  Log  Probability  of  Acceptance  (MI.P)  .  A  feature  which  is  almost 
always  absent  but  does  reliably  occur  at  some  point  within  a  vocabulary  item 
(or  vice-versa)  is  an  effective  rejection  device  for  eliminating  false  recog¬ 
nitions.  A  measure  sensitive  to  this  situation  can  be  obtained  by  finding  p, 
the  frequency  with  which  a  feature  is  present  over  all  vocabulary  items,  and 
computing  the  probability  with  which  a  random,  uncorrelated  sequence  of  zeros 
and  ones,  wherein  the  ones  occur  with  frequency  p,  would  be  accepted  by  the 
Transition  Letter  Sets  for  that  word.  MLP  is  the  negative  natural  logarithm 
of  that  probability ,  and  can  be  computed  from 

MLP  “T^T  S  l#o,v  log  (l-pv)  +  #,,v  log  Pyl 
1  1  veV 

where  |v|  is  the  number  of  vocabulary  items 

p  is  the  average  frequency  with  which  the  feature  occurs  in  all 
vocabulary  items 

Evaluation  and  Analysis  of  Feature  Quality  Measures.  Figure  3  shows  estimates 
obtained  for  the  six  quality  measures  described  above.  Each  quality  measure 
is  a  non-negative  number,  with  higher  values  suggesting  greater  utility  of 
the  feature  for  speech  recognition  purposes. 

As  described  earlier,  each  of  these  measures  is  an  ad  hoc  construction 
based  on  an  intuitive  concept  of  what  characteristics  a  feature  might  indicate 
utility  for  recognition.  If  some  of  the  measures  evaluated  are  in  fact  reli¬ 
able  and  accurate  indicators  of  utility  for  recognition,  then  it  would  be  ex¬ 
pected  that  significant  correlation  would  appear  among  those  measures. 
Unfortunately,  perusal  of  the  data  in  Figure  3  shows  poor  correlation  between 
all  pairs  of  quality  measures.  This  observation  is  borne  out  by  the  data  in 
Figure  4,  which  shows  the  coefficient  of  correlation  and  coefficient  of  deter¬ 
mination  (the  square  of  the  coefficient  of  correlation)  between  all  pairs  of 
quality  measures.  Reliable  parrs  of  indicators  would  exhibit  a  large  positive 
coefficient  of  correlation  and  coefficient  of  determination  (both  near  +1). 
Since  none  do,  there  is  at  most  one  reliable  and  accurate  quality  indicator 
among  those  used . 

The  lack  of  consistency  among  all  pairs  of  suggested  quality  measures  is 
quite  remarkable,  in  view  of  the  rational  and  intuitively  appealing  basis  for 
each  of  the  individual  measures.  It  appears  that  no  two  of  the  measures  are 
reliable  and  accurate  indicators  of  feature  utility.  It  remains  possible, 
however,  that  a  consensus  (if  one  exists)  of  the  measures  may  be  indicative  of 
feature  utility.  This  possibility  was  investigated  as  described  in  the  fol¬ 
lowing  paragraphs. 

Each  of  the  tentative  quality  measures  establishes  an  order  of  preference 
(mathematically  speaking,  a  partial  order)  on  the  set  of  features.  This  pref¬ 
erence  structure  is  shown  in  Figure  5.  In  that  figure,  a  feature  in  a  main 
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Feature  Quality  Measure 
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Feature  same  for  VIP-100  and  TI-500  (LP^  is  not  shown). 

Figure  3.  Estimates  of  Individual  Feature  Quality  Measures 
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Figure  4.  Coefficients  of  Correlation  and  Determination 
Between  Pairs  of  Feature  Quality  Measures 
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vertical  column  is  preferred  over  any  other  feature  below  it  In  the  main  col¬ 
umn.  Features  offset  to  the  right  are  all  preferred  equally  to  the  feature 
immediately  above  in  the  main  column.  Features  in  the  Initial  Feature  set  are 
marked  with  an  asterisk. 

A  useful  concept  lor  dealtnq  with  incompat lble  orders  or  preference  struc¬ 
tures  is  Pareto  optimality.  In  this  application,  a  sot  of  fifteen  features  is 
Pereto  optimal  if  there  is  no  other  set  of  fifteen  features  which  is  preferable 
under  each  of  the  six  preference  structures.  (A  set  S  is  preferable  to  a  set 
S'  if  and  only  if  each  member  of  S  is  not  less  preferable  than  any  member  of 
S',  and  some  member  of  S  is  definitely  preferable  to  some  member  of  S'.) 
Starting  with  any  set  of  features,  one  may  derive  from  it  a  Pereto  optimal  set 
by  examining  its  elements  one-by-one  to  determine  if  any  feature  not  in  the 
set  is  uniformly  at  least  .is  preferred  under  each  quality  measure,  and  def¬ 
initely  preferred  under  at  least  one  quality  measure.  If  so,  that  element  is 
replaced  by  the  preferred  one,  and  the  process  is  repeated  until  no  further 
change  takes  place. 

(i 

when  this  process  is  applied  to  the  Initial  Feature  Set,  it  is  found  to 
be  very  nearly  Pareto  optimal.  For  three  members  of  this  set  of  features  there 
is  one  uniformly  preferred  feature  in  the  Complementary  Set:  B15  is  the  onlv 
feature  uniformly  preferable  to  A7  and  similarly  preferable  to  AS;  AS  is  the 
only  feature  preferable  to  A9;  three  features,  At,  A5  and  A15  are  all  uniformly 
preferable  to  AM.  The  Initial  Feature  Set  can  therefore  be  made  Pareto 
optimal  by  replacing  A7  or  AS  with  BIS,  A‘t  with  AS  and  AM  with  At,  AS  or  A1S. 

An  interesting  consistency  among  these  exchanges  appears  when  the  acous¬ 
tical  meaning  of  the  features  is  considered.  Each  of  the  features  to  be  re¬ 
placed  (A7  or  AS,  A9  and  AM)  is  an  indicator  of  high  energy  at  some  portion 
of  the  spectrum,  and  the  replacing  features  are  mostly  (8b,  AS,  and  At,  but 
not  AtS)  more  complex  indicators  either  of  specific  phonemes  or  mine  general 
spectral  characteristics,  such  as  a  positive  energy  slope  over  a  range  of  fre¬ 
quencies.  It  is  tempting  to  inter  that  Pareto  optimization ,  which  in  some 
sense  represents  a  consensus  of  the  quality  measures,  reveals  a  preference  for 
the  more  complex  features  over  the  more  basic  spectral  energy  concentrat ion 
indicators.  Some  confidence  in  this  interpretation,  and  the  indications  of 
the  Pareto  optimization  results  in  general,  might  be  iustified  if  the  Coir,  le- 
mentary  Feature  Set  were  found  to  be  far  from  Pareto  optimal.  Unfortunately, 
this  is  not  the  case.  Carrying  out  the  optimization  process  for  the  Comple¬ 
mentary  Feature  Set  requires  only  that  B1^  be  replaced  by  B1  and  B11  be  re¬ 
placed  by  A9,  B1  or  BIO.  The  Complementary  Feature  Sot  is  thus  even  more 
nearly  Pareto  optimal  than  is  the  Initial  Feature  Sot.  This  fact  indicates 
that  the  six  putative  quality  measures  are  very  incompatible  and  that  any  sub¬ 
set  of  fifteen  features  is  probably  almost  Pareto  optimal. 

The  dismal  failure  of  the  quality  measures  to  give  clear  indications  of 
differences  among  feature  and,  in  fact,  to  demonstrate  anything  at  all,  is  an 
indictment  of  any  intuitive  approach  to  evaluating  feature  utility.  Apparently 
a  satisfactory  evaluation  of  feature  utility  will  have  to  await  a  more  pene¬ 
trating  analysis. 
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In  the  absence  of  any  satisfactory  indication  of  relative  individual  fea¬ 
ture  utility  for  recognition,  the  Initial  Feature  Set  was  retained  for  use  in 
the  remainder  of  the  VIAS  project.  One  virtue  of  this  selection  is  that  the 
speech  data  gathered  usinq  this  set  of  features,  and  results  obtained  with 
them,  extend  the  data  set  and  results  learned  in  other  related  projects  (such 
as  the  Laboratory  Version  AIC  Traininq  System) ,  which  use  the  Initial  Feature 
Set. 

FEATURE  SET  VALIDATION  (Task  Id) .  Selection  of  the  Initial  Feature  Set  for  use 
in  the  subsequent  analyses  in  this  project  was  validated  by  monitorinq  the  per¬ 
formance  of  the  entire  LISTEN  speech  processinq  system  operating  with  that 
selected  set  of  features.  The  process  of  extractinq  speech  characteristics 
for  two  speakers  (LHN  and  JEP)  was  monitored  especially  carefully  to  detect 
any  indication  of  individual  feature  peculiarity.  Transition  Letter  Sets  were 
extracted  as  usual  from  ninety-six  examples  of  the  eleven  word  LCSR  vocabulary, 
using  the  GEEC  program.  Each  feature  clearly  contributes  to  the  recognizabil- 
ity  of  at  least  some  vocabulary  items,  and  most  features  display  regularity  in 
most  vocabulary  items  for  both  speakers.  The  Transition  Letter  Sets  obtained 
by  GZEC  are  shown  in  Figures  1  and  2.  Loop  Letter  Sets  generated  for  the  same 
speech  data  also  failed  to  reveal  any  anomalous  character istic  of  any  individ¬ 
ual  feature.  The  Loop  Letter  Sets  indicated  that  the  Transition  Letter  Sets 
almost  completely  characterize  the  TTI-500  output  for  each  vocabulary  item,  as 
they  did  for  the  VIP- 100  output.  That  is,  Loop  Letter  Set  states  are  quite 
infrequently  entered,  most  words  being  recognized  through  a  sequence  of  tran¬ 
sitions  from  one  Transition  Letter  Set  state  to  the  next. 

The  remainder  of  the  voice  data  analysis  process  leading  to  the  data  base 
needed  for  real-time  recognition  is  not  easily  related  to  individual  feature 
character istics .  These  processes  include  collecting  data  about  the  timing  of 
transition  and  loop  sounds  (states) ,  violation  and  artifact  (false  alarm) 
rates,  etc.  However,  these  were  monitored  and  no  pecul iarities  attributable 
to,  or  suggestive  of,  individual  feature  anomalies  were  detected. 

Although  no  specific  anomalies  were  noted  in  the  process  of  extracting 
voice  reference  data  crora  speech  samples  for  these  two  speakers,  the  recog¬ 
nition  accuracy  which  LISTEN  exhibited  for  them  was  significantly  inferior 
to  that  obtained  for  MWG  using  the  VIP- 100.  As  described  in  connection  with 
Example  Set  generation,  the  poor  performance  for  JEP  can  be  attributed  to  the 
method  of  generating  individual  vocabulary  items,  but  MWG’s  and  LHN's  voice 
data  were  processed  in  functionally  identical  ways.  It  remains  ambiguous, 
therefore,  whether  the  difference  in  recognition  performance  between  these  two 
speakers  is  due  to  speaker  peculiarities  or  speech  preprocessor  differences, 
and  if  the  latter,  whether  a  different  selection  of  features  might  lead  to 
better  recognition  accuracy.  Unfortunately,  project  resources  did  not  permit 
resolution  of  this  ambiguity. 

SUMMARY  OF  TECHNOLOGY  TRANSFER  TASK  RESULTS.  Practical  considerations  forced 
an  indirect  approach  to  choosing  a  set  of  features  for  use  with  LISTEN  from 
those  available  from  the  newer  model  speech  preprocessor.  An  Initial  Features 
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Set  was  constructed  following  the  preprocessor  manufacturer ' s  recommendations. 
Six  measures  of  individual  feature  utility  were  posited,  evaluated  and  the 
results  analyzed.  The  six  measures  were  found  to  be  pairwise  incompatible  to 
a  high  degree.  The  Initial  Feature  Set  was  adopted  for  use  in  the  remainder 
of  the  study  in  the  absence  of  any  rationale  for  selecting  another  set,  be¬ 
cause  this  extends  the  accumulated  data  and  experience  based  on  that  set  of 
features. 

In  a  qualitative  sense,  the  previously  developed  LISTEN  technology  was 
successfully  transferred  to  the  new  preprocessor  in  all  phases  of  the  LISTEN 
operation,  including  voice  data  collection,  voice  data  analysis  and  reference 
data  generation,  and  real-time  voice  recognition,  in  the  sense  that  no  quali¬ 
tative  change  to  LISTEN  was  required  to  achieve  recognition.  However,  inferior 
recognition  performance  for  LHN ,  whose  voice  data  were  processed  in  essentially 
the  same  way  as  MWG's,  leaves  it  unclear  as  to  whether  LISTEN  can  obtain  simi¬ 
lar  performance  with  the  two  preprocessors.  On  a  word  basis  (counting  all  in¬ 
sertions,  deletions  and  substitutions  as  errors)  45*  recognition  was  obtained 
for  MWG  and  89*  for  LHN,  usinu  Test  data,  without  speaker  feedback.  This  mar¬ 
ginal  difference  in  performance  could  presumably  be  due  to  either  speaker  or 
preprocessor  differences. 
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CONTRIBUTION  OF  EACH  INFORMATION  SOURCE  TO  RECOGNITION 

As  described  In  Reference  1,  the  LISTEN  speech  recognition  system  has  two 
major  subdivisions,  implemented  in  proarams  MEX  and  MTNT.  mex  detects  the 
presence  of  segments  of  speech  preprocessor  output  which  exhibit  the  structural 
characteristics  of  individual  vocabulary  items,  and  notified  MINT  of  these 
potential  word  recognitions.  MEX  also,  in  this  process,  notes  the  presence  of 
certain  non-fatal  structural  peculiarities  (if  anv)  of  each  potential  recogni¬ 
tion.  MINT  then  processes  these  data  to  distinguish  between  real  recognitions 
and  artifactual  ones.  In  doing  so,  MINT  uses  information  of  various  other 
kinds.  Each  of  these  information  sources  is  discussed  separately  in  the  follow¬ 
ing  paragraphs. 

CONTRIBUTION  OF  STRUCTURAL  INFORMATION  IN  MEX.  Structural  data  are  used  in 
two  ways  in  the  LISTEN  recognition  procedure,  as  the  description  of  MEX  and  MINT 
just  given  shows.  The  expected  structure  of  individual  vocabulary  items  is 
first  used  in  MEX  to  detect  the  potential  presence  of  that  item  in  the  incoming 
stream.  The  first  use  of  structural  information  is  thus  an  initial  detection 
function?  indicators  of  its  contribution  are  the  freouency  with  which  vocaliza¬ 
tions  of  words  are  not  detected,  and  the  frequency  with  which  artifacts  are 
generated.  These  data  are  presented  in  Figure  6  for  Test  data. 
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Figure  6.  Missed  and  Artifactual  Recognitions  in  MEX  Output 


The  difference  in  MEX  failure  and  artifact  production  rates,  between  the 
two  speaker /preprocessor  combinations,  is  quite  remarkable.  The  artifact 
production  rate  for  LHN  is  half  that  for  MM3,  at  the  cost  of  three  times  the  MEX 
rejection  rate.  The  detection  of  the  potential  presence  of  a  word  in  the  speech 
signal  is  primarily  dependent  upon  the  combined  discrimination  capabilities  of 
the  preprocessor  fe..Lures  and  the  Transition  Letter  Sets.  Visual  and  ouantlta- 
tive  comparison  of  the  Transition  Letter  Sets  for  these  two  speakers  falls  to 
reveal  any  substantive  difference.  (For  example,  both  speakers  average  R.5 
Transition  Letter  Sets  per  vocabulary  item. )  If  there  were  signif leant  differ¬ 
ences  in  the  ariculatory  habits  of  the  two  speakers,  for  example  in  enunciation 
precision,  presumably  the  difference  would  be  evident  as  structural  differences 
in  their  respective  Transition  Letter  Sets.  As  there  are  no  apparent  differ¬ 
ences,  it  seems  likely  that  the  contrasting  MEX  failure  and  artifact  production 
rates  are  character istic  of  the  preprocensors,  or  at  least  of  the  sets  of  fea¬ 
tures  LISTEN  accepts  from  the  two  preprocessors,  and  not  due  to  differences 
between  the  two  speakers. 
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CONTR l BUT ION  OF  OTHER  INFORMATION  SOURCES  IN  MINT.  In  the  process  of 
detecting,  by  vise  of  structural  information,  the  potential  occurrence  of  a  voca¬ 
bulary  item  in  the  speech  data  stream,  MEX  also  computes  two  measures  of  how 
typical  the  time  duration  of  various  detected  feature  combinations  are.  mint 
thus  receives  from  MEX  notification  of  the  occurrence  of  a  potential  recognition 
of  a  particular  type,  start  and  end  times  of  the  potential  recognition,  an  indi¬ 
cation  of  any  detected  structural  pecul lar it ies,  and  two  indicators  of  temporal 
peculiarity.  Using  the  start  and  end  times  of  the  potential  recognition,  MINT 
(in  principle,  at  least)  builds  and  operates  upon  a  directed  qraph  representing 
the  utterance.  This  directed  graph  consists  of  a  Start  and  an  End  node,  togeth¬ 
er  with  one  additional  node  for  each  potential  recognition.  A  pair  of  nodes  is 
Joined  by  a  directed  edge  if  and  only  if  the  start  and  end  times  of  the  events 
are  compatible  with  one  node  representing  the  event  immediately  preceding  the 
event  represented  by  the  other  node.  MINT  then  computes  the  path  through  this 
directed  graph,  moving  backwards  from  End  to  Start,  seeking  the  best  explanation 
of  what  has  been  observed  about  the  utterance.  In  the  process  of  doing  this 
computation,  MINT  adds  to  the  structural  violation  and  intraword  timing  data 
supplied  by  MEX,  data  about  the  a  priori  probability  that  a  potential  recogni¬ 
tion  is  real  versus  artifact,  about  its  coincidence  in  time  with  other  potential 
recognitions,  and  about  the  interword  timing.  All  of  these  data  are  expressed 
numerically  as  a  scaling  constant  (-64)  times  the  natural  logarithm  of  the 
likelihood  ratio  for  the  occurrence  of  what  was  actually  observed.  That  is,  the 
i information  source  is  summarized  as  a  value 

ao  „  Prob  (observation/real) 

UU  ,  —  ”*0*4  vll  *~  *'*'  *'  '*  "'**  '*"  - - - -  -  '  — 

l  Prob  (observation/artifact) 

Those  information  sources  relating  to  individual  potential  recognitions 
produce  values  associated  with  nodes,  and  the  interword  timing  data  produce 
.AQ  values  associated  with  edges  of  the  directed  graph. 

The  values,  attached  by  MINT  to  nodes  and  edges  of  the  graph  of  poten¬ 
tial  recognitions,  are  estimates  of  the  scaled  loo  likelihood  ratios  based  on 
statistical  models  of  each  information  source.  The  parameters  of  these  statis¬ 
tical  models  are  estimated  from  speech  data  during  the  voice  data  generation 
process.  Validity  of  these  statistical  models  and  estimation  procedures  is 
examined  in  Tasks  5a  and  6b,  described  later  in  this  section.  In  this  task, 
attention  is  directed  to  determining  how  effectively  each  information  source,  as 
represented  by  its  associated  values,  contributes  to  the  recognition 
procedure. 

As  shown  in  Reference  1,  under  suitable  assumptions,  the  Raves  optimal 
solution  to  the  problem  of  deciding  which  path  through  the  graph  is  best  reduces 
to  the  problem  of  finding  the  path  with  minimum  sum  of  values  on  nodes  and 
edges.  Evaluating  an  information  source's  contribution  to  recognition  thus 
reduces  to  determining  how  effectively  the  JvQ  values  help  establish  the  correct 
oath  through  the  graph  as  the  one  with  minimum  total  cost.  Although  MINT 
considers  all  possible  paths  through  the  graph  of  the  utterance,  correct 
identif ication  of  the  spoken  words  depends  decisively  on  the  ^0  values  attached 
to  two  particular  paths  throuah  the  graph  (when  they  both  exist):  the  lowest 
cost  path  of  those  which  gives  the  correct  answer,  and  the  lowest  cost  path  of 
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those  which  give  any  incorrect  answer.  If  those  two  paths  exist,  the  correct 
answer  is  found  precisely  when  the  total  cost  of  the  fornsr  is  less  than  the 
total  cost  of  tlie  latter.  The  effectiveness  of  the  i  th  information  source, 
in  establishing  a  correct  path  as  the  chosen  one,  is  thus  indicated  by  the 
difference  between  the  sums  of  the  A£  values  along  the  best  of  the  incorrect 
paths  and  the  best  of  the  correct  paths.  Subtracting  the  latter  from  the 
former  gives  a  value  which,  when  positive,  indicates  that  the  information 
source  in  question  is  a  productive  contributor  to  selecting  the  correct  path 
but  which,  when  negative,  indicates  that  the  information  source  is  counter¬ 
productive  . 

The  first  measure  used  to  evaluate  the  contribution  of  the  it^1  information 
source  is,  for  the  reasons  just  given,  defined  to  be 


M  =  (SAQ  1  -  (IAQ. ) 

i  i  i 

best  best 

incorrect  correct 

path  path 

The  measure  of  information  source  contribution  just  defined  cannot  be  applied 
if  the  graph  of  the  utterance  does  not  contain  at  least  one  path  giving  a 
correct  result  and  at  least  one  path  giving  an  incorrect  result.  Although 
there  are  several  different  possible  reasons  for  such  a  situation  arising, 
only  one  has  been  observed  to  arise  commonly  in  practice.  That  is  the  failure 
of  MEX  to  detect  a  word  actually  spoken  and  inform  MINT  of  its  existence  as  a 
potential  recognition.  Failures  of  this  type  occur  only  when  the  word  as 
spoken  does  not  have  expected  structural  characteristics ;  i.e.,  when  the  word 
exhibits  extensive  structural  violation.  These  cases  are  much  in  the  minority. 

The  measure  M  gives  a  value  to  the  contribution  of  each  information  source 
towards  correct  recognition  in  each  utterance.  To  summarize  the  utility  of 
the  information  source  ver  manv  utterances  requires  seme  approach  to  dealing 
with  the  collection  of  M  values  for  each  utterance.  One  approach,  adopted  here, 
is  to  present  a  graph  of  the  cumulative  distribution  of  observed  M  values. 

The  PASS  program  STATSUM  computes  the  best  correct  and  best  incorrect 
path  through  the  graph  of  each  utterance,  and  also  the  contribution  of  each 
information  source  _o  the  cost  difference  between  these  two  paths,  i.e.,  the 
M  value  defined  above. 

Figures  7  and  8  show  the  M  distributions  for  each  information  source,  for 
MWG  and  LHN ,  respectively.  From  these  graphs  one  can  obtain  at  a  dance  such 
indicators  as  the  fraction  of  cases  wherein  the  information  source  was  counter¬ 
productive  (i.e.,  the  fraction  of  cases  where  M  is  negative)  and  such  qualita¬ 
tive  features  as  evidence  of  peculiar  clusters  of  cases. 

Since  M  can  be  interpreted  as  an  estimate  computed  in  MINT  the  logarithm 
of  the  likelihood  ratio  for  the  correct  path  being  in  fact  correct ,  M  values 
can  be  translated  into  odds  that  the  correct  path  is  in  fact  correct .  For 
exanvle,  an  M  value  of  '47. 4  corresponds  to  an  estimate  in  MINT  that,  accord¬ 
ing  to  that  particular  information  source,  the  odds  are  1(1- to- 1  that  the 
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correct  path,  rather  than  the  best  incorrect  one,  13  in  fact  correct.  Odds 
values  are  indicated  in  Figures  7  and  8. 

Another  interesting  indication  of  the  relative  value  of  an  information 
source  is  the  frequency  with  which  it  is  the  most  "productive"  of  all  the 
sources  considered,  in  the  sense  of  differentiating  most  strongly  (and  cor¬ 
rectly)  between  the  best  correct  and  best  incorrect  explanations  of  an  utter¬ 
ance  as  indicated  by  a  most  positive  M  value.  The  complementary  notion  is  the 
infrequency  with  which  the  information  source  is  not  the  most  counterproductive 
one  (i.e.,  does  not  have  the  most  negative  M  value) .  An  information  source 
which  is  essentially  random  and  which  takes  on  large  values  would  frequently 
be  the  most  productive  and  also  frequently  be  the  most  counterproductive,  as 
these  terms  are  defined  above.  Therefore,  both  these  indications  of  informa¬ 
tion  source  quality  should  be  considered  simultaneously. 

Using  tiie  data  produced  by  STATSUM,  it  is  possible  to  compute  the  fraction 
of  cases  in  which  each  particular  information  source  is  the  most  productive 
and  the  fraction  of  cases  in  which  it  is  not  the  most  counterproductive. 

Figure  9  shows  these  figures  for  MWG's  and  LHN's  test  data,  in  the  form  of 
two-dimensional  plots.  The  same  data  are  given  in  tabular  form  in  Figure  10. 

As  can  clearly  be  seen  in  Figure  9,  there  is  a  definite  consistency  in  the 
productivity  of  each  information  source  for  both  speakers,  with  the  single 
exception  of  the  association  information  source.  If  one  uses  as  a  measure  of 
the  utility  of  an  information  source  the  sum  of  the  frequencies  with  which  it 
is  most  productive  and  not  least  productive,  the  following  ranking  (best  to 
worst)  of  information  sources  holds  for  both  speakers: 


a  • 

I n  te  rvo  rd 

Timing 

b. 

Violation 

Category 

c . 

Intraword 

Timing  (QT) 

d. 

A  priori  , 

and  Intraword  Timing  (QL) 

with  Association  somewhere  below  Violation  Category. 

Another  interesting  single  valued  measure  of  the  contribution  of  each 
information  source  can  be  obtained  by  computing  the  information  contained  in 
the  distribution  of  M  regarding  the  selection  of  the  correct  path.  To  apply 
the  theory  of  information  to  this  situation,  one  can  model  it  as  follows.  Let 
the  two  paths  contending  for  choice  (the  best  of  the  correct  and  the  best  of 
the  incorrect)  be  labeled  A  and  B  in  the  order  they  are  discovered  in  MINT.  As 
this  is  entirely  random  labeling,  the  correct  path  has  equal  probability  of 
being  path  A  or  path  B.  MINT  computes  the  total  AQ  along  the  two  paths  and 
chooses  the  path  with  minimum  value.  The  difference  in  path  AQ  values  (say, 
path  A  minus  path  B)  will  then  be  distributed  as  a  random  variable  equal  to  S 
times  M,  where  S  is  a  random  variable  with  probability  one  half  of  being  either 
+1  or  -1,  and  M  is  the  difference  in  AQ  values  for  the  incorrect  path  minus  the 
correct  path.  The  product  SM  is  'the  value  available  in  MINT,  which  may  be 
regarded  as  a  signal  received.  The  message  sent  is  equivalent  to  designation 
of  which  path  (A  or  B)  is  the  correct  one,  or  equivalently,  whether  the  value 
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of  S  is  +1  or  -1.  The  information  content  of  the  si  anal  about  the  messaae  sent 
according  to  Information  Theory,  is  the  entropy  of  the  random  variable  SM  minus 
the  entropy  of  the  conditional  random  variable  SM  given  R.  This  is  a  value 
lyinq  between  zero  and  one  bit  of  information  (one  bit  being  exactly  enough 
information  to  decide  perfectly  between  the  two  al t ernat i ves 1 •  If  the  M  value 
were  always  positive,  for  instance,  cine  could  always  recognize  the  correct  oath 
as  the  one  with  leant  AC  value.  The  information  content  in  that  case  is  found 
to  be  one.  If  M  is  distributed  symmetrically  about  zero,  its  information 
content  is  zero. 

The  Information  content  of  an  Information  source,  defined  above,  has  several 
Interesting  properties.  One  of  them  is  that  it  establishes  an  upper  bound  on 
how  successfully  an  information  source  can  be  used  to  select  the  right  path, 
i.e.,  to  recognize  what  was  spoken,  regardless  of  the  algorithm  used  to  effect 
the  recognition. 

The  information  content  of  each  information  source  has  been  estimated  from 
the  observed  distribution  of  M  values.  The  results  are  given  in  Figure  IP. 

These  data  tend  to  corroborate  the  ranking  given  to  each  of  the  information 
sources  earlier. 

The  fraction  of  time  that  an  information  source  gives  a  correct,  i.e.,  pro¬ 
ductive,  indication  of  the  right  path  can  be  read  from  the  cumulative  distribu¬ 
tion  of  M  values.  A  positive  M  value  indicates  a  correct  indication,  and  a 
negative  M  value  an  incorrect  one.  These  data  are  also  summarized  in  Figure  10 
for  each  information  source,  and  lend  further  evidence  that  the  information 
source  ranking  given  earlier  is  correct.  (Since  there  are  onlv  eight  violation 
categories,  and  violations  are  relatively  rare,  it  often  happens  that  the  two 
paths  do  not  have  potential  recognitions  which  differ  in  violation  category. 

The  M  value  in  that  case  is  zero,  and  the  information  source  is  equivocal.  The 
frequency  with  which  this  occurs  is  also  given  in  Figure  10.1 

ANALYSIS  OF  RECOGNITION  ERRORS 

Two  aspects  of  recognition  error  analysis  covered  here  are  the  automatic 
c las i f leaf  ion  of  errors  by  programs  in  the  TASS  and  relating  recognition  errors 
to  information  sources.  The  related  analyses  are  discussed  below. 

AUTOMATIC  CLASSIFICATION  OF  RECOGNITION  ERRORS.  in  connected  speech,  many 
possible  explanations  for  the  observed  speech  data  are  usually  generated  in  an 
attempt,  to  recognize  what  was  actually  spoken.  when  a  wrong  explanation  is 
selected,  the  caus"  may  he  related  to  a  large  number  of  factors.  This  is 
especially  true  in  an  algorithm  like  MINT  which  considers  the  entire  complex  of 
potential  recognitions  and  all  plausible  explanations  for  the  entire  utterance. 
Class i f teat  ion  of  errors  might  at  first  seem  like  a  hopeless  task,  as  the 
process  can  apparently  go  wrong  in  very  many  ways.  However,  when  the 
recognition  system  works  even  moderatelv  well,  most  errors  are  found  to  belong 
to  a  small  collection  of  types.  Simple  deletions,  insertions  and  one-for-one 
substitutions,  for  example,  comprise  the  maturity  of  all  erors.  So 
class  if icat ion ,  and  its  automation,  is  not  a  hopeless  go  a 1 . 
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A  useful  dichotomy  of  recognition  failures  distinguishes  between  those 
cases  where  there  does  not  exist,  a  path  through  the  graph  of  the  utterance 
which  yields  the  correct  string  of  vocabulary  items,  and  those  cases  where 
there  does  exist  such  a  path.  The  former  will  be  called  "structural  failures." 

Structural  failures  are  generally  of  two  types.  One,  treated  earlier, 
results  from  MF.X's  failure  to  detect  the  potential  presence  of  a  word  actually 
spoken.  The  other  type  occurs  when  the  correct  potential  recognitions  are 
present  in  the  graph  of  the  utterance ,  but  MINT  fails  to  consider  a  path 
through  them.  This  can  occur  only  when  the  interword  timing  is  so  anomalous 
as  to  exceed  limits  (set  in  MINT)  on  the  time  between  potential  recognitions 
to  be  considered  potential  predecessors. 

The  PASS  program  BIGMINT  recognizes  structural  failures  and  provides 
data  whereby  the  type  of  failure  involved  may  easily  be  determined. 

Misrecognitions  involving  a  source  of  error  other  than  structural  failure 
occur  because  some  incorrect  path  through  the  directed  graph  of  the  potential 
recognition  has  lower  total  cost  than  any  correct  path.  By  considering  only 
the  best  of  the  correct  and  the  best,  of  the  incorrect  paths,  the  locus  of  the 
difficulty  becomes  apparent  because  even  in  utterances  of  several  words,  the 
best  of  the  correct  and  the  best  of  the  incorrect  paths  usually  have  much  in 
common,  the  difference  existing  only  at  a  small  portion  of  the  utterance. 

As  an  example  of  the  simplification  obtained  by  considering  only  the  best 
of  the  correct  and  best  of  the  incorrect  paths,  consider  the  following.  The 
phrase  "015.”  occurs  in  Test  data  for  MWG.  This  utterance  was  misrecognized, 
as  there  were  five  paths  through  the  graph  of  the  utterance  with  lower  cost 
than  that  of  the  correct  path.  These  five  paths  corresponded  to  201557,  20155, 
015.7,  01557  and  0155,  the  last  one  having  least  cost.  Examining  the  correct 
path  (015.)  and  the  best  of  the  incorrect  paths  (0155)  on  a  node-by-node  basis 

shows  that  they  entail  the  same  first  three  nodes  (not  obvious  from  the 

vocabulary  items),  differing  only  in  the  last  node.  The  node-by-node  analysis 
shows  that  this  is  a  case  of  simple  substitution,  and  the  four  other  incorrect 
paths  with  costs  less  than  the  correct  path  are  not  informative,  being  present 
only  because  of  the  anomalous  properties  of  the  final  and  the  end  of  the 
utterance  -  anomalies  already  indicated  by  the  comparison  of  best  correct  and 
best  incorrect  path.  This  simplification  is  typical  to  the  point  of  beinq 
universal . 

Comparison  of  the  best  correct  and  best  incorrect  path  becomes  impossible 

if  there  is  no  correct  path  or  no  incorrect  path.  But  when  at  least  one 

correct  and  one  incorrect  path  through  the  utterance  exist,  the  utterance  can 
be  further  categorized  as  entailing  either  a  single  or  multiple  branches  at 
which  the  two  paths  differ.  The  concept  is  illustrated  in  Figure  11. 

When  the  difference  between  the  best  paths  is  the  insertion  or  deletion 
of  contiguous  words,  the  path  difference  is  interpreted  as  a  single  branch 
case  ,  as  in  Figure  1 1  (b)  . 

The  distinction  between  single  branch  and  multiple  branch  categories  is 
useful  because  the  single  branch  group  is  amenable  to  further  subdivision  and 
because  the  multiple  branch  case  is  so  rare.  (It  has  not  been  observed  to  occur.) 
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Figure  11.  Illustrating  Single  and  Multiple  Branch  Differences  Between 
the  Best  Correct  (Solid)  and  Best  Incorrect  (Dotted)  Paths 
Through  the  Graph  of  an  Utterance 


Classi tying 
paths  thus  gives 


utterances  on  the  basis  of  the  best  correct  and  best 
rise  to  four  categories  of  cases: 


incorrect 


Category  0  No  best  correct  path  exists  (structural  failure) 

Category  1  A  single  branch  distinguishes  the  best  correct  and 

best  incorrect  paths 


Category  2  Multiple  branches  distinguish  best  correct  and  best 
inc  rrect  paths 


Category  1 


No  best  incorrect  path  exists 


Among  the  Category  1  cases  one  may  further  distinguish  cases  on  the  basis 
of  the  number  of  nodes  in  each  part  of  the  differentiating  branch.  Wj-itino 
the  number  of  node..  twcrds)  in  the  correct  branch  on  the  right  and  the  nodes 
in  the  incorrect  branch  on  the  left,  a  type  (0,1)  utterance  is  one  in  which  the 
best  incorrect  path  is  formed  by  deleting  one  word  in  the  correct  utterance. 

If  the  best  incorrect  path  has,  in  fact,  lower  associated  cost  than  the  best 
correct  path,  a  simple  deletion  occurs.  Similarly,  a  type  (1,1)  utterance  is 
one  in  which  the  error,  or  potential  error,  is  a  simple  substitution.  A  type 
f 1,0)  utterance  is  potentially  a  simple  insertion,  and  a  type  (2,1)  utterance 
would  potentially  be  a  more  complex  type  of  substitution. 


Tlie  PASS  Program  STATSUM  classifies  all  utterances  (both  those  correctly 
and  those  incorrectly  recognized)  according  to  Category  and  Tvpe  as  defined 
above,  and  the  program  SSPLOT  prints  various  data  about  each  classified  group. 
These  data  can  be  used  to  classify  any  set  of  utterances,  including  the  subset 
of  utterances  with  errors,  as  SSPLOT  indicates  which  of  the  classified 
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utterances  were  mis recognized.  Classification  of  all  utterances  in  this  uni¬ 
form  way  is  useful  in  that  it  shows  both  how  typical  various  types  of  conten¬ 
tion  arise  in  LISTEN,  and  the  relative  success  MINT  has  in  resolving  each  type 
of  contention. 

The  results  of  classifying  test  data  for  MWG  and  LHN  in  this  way  are  shown 
in  Figure  12. 

The  simpler  forms  of  contention  are  found  to  be  most  common  in  LISTEN. 

MEX's  failure  to  spot  the  potential  presence  of  a  word,  and  simple  insertion, 
deletion  or  substitution  of  a  simple  word  cover  the  vast  majority  of  cases 
examined. 

Correct  recognition  most  frequently  entails  the  resolution  of  which  of 
two  alternative  words  to  choose;  i.e.,  resolution  of  a  simple  one-for-one 
substitution  problem.  Furthermore,  this  is  a  most  difficult  problem  to  resolve, 
as  indicated  by  the  relatively  small  fraction  of  these  cases  resolved  correctly. 
One  probable  reason  for  the  difficulty  in  resolving  substitution  of  like  num¬ 
bers  of  words  (type  (1,1)  and  (2,2)  controversies)  is  that  the  strongest  infor¬ 
mation  source,  interword  timing,  is  relatively  ineffectual  in  these  cases. 

MI SRE COGNITIONS  VIS-A-VIS  INFORMATION  SOURCES .  The  data  produced  by  STATSUM, 
comparing  the  best  incorrect  and  best  correct  paths,  permits  detailed  examina¬ 
tion  of  the  course  of  each  mis  re  cognition .  £uite  often  one  peculiarity  of  a 
troublesome  word  stands  out  in  a  misrecogni zed  utterance,  but  the  specific 
nature  of  the  peculiarity  varies  from  utterance  to  utterance.  While  it  is 
easy  to  identify  the  most  counterproductive  information  source  in  individual 
utterances,  it  is  not  reasonable  to  summarize  these  results  for  misrecogni ti on 
cases  only.  An  information  source  may  correlate  highly  with  both  correct  and 
incorrect  recognitions,  if  it  has  a  random  component  large  enough  to  dominate 
all  other  information  sources.  To  maintain  a  balanced  view  of  an  information 
source,  then,  it  is  important  to  consider  its  influence  on  correct  recognitions 
as  well  as  on  misrecognitions .  This  was  done  in  the  analysis  of  the  contribu¬ 
tion  of  each  information  source,  summarized  in  the  earlier  Figures  6  through  10. 

Another  indication  of  the  association  of  errors  with  information  sources 
can  be  obtained  by  counting  the  number  of  correctly  and  incorrectly  recognized 
utterances,  categorized  by  which  was  the  most  productive  and  which  the  least 
productive  information  source.  The  PASS  program  STATSUM  provides  data  from 
which  these  counts  can  easily  be  accumulated.  Results  obtained  in  that  way 
are  presented  in  Figure  13. 
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Figure  13.  Counts  of  Utterances  Correctly  and  Incorrectly  Recognized, 
Categorized  by  Most  Productive  (Bostl  and  Least  Productive 
(Worst)  Information  Source.  Test  data,  Category  1,  Types 
(0,1),  (1,0)  and  (1,1)  . 
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CRITICAL  EXAMINATION  OF  INFORMATION  SOURCE  MODELS 

In  the  decision  theoretic  model  of  the  problem  solved  by  MINT,  each 
information  source  is  considered  to  provide  one  component  of  a  complex  obser¬ 
vation  of  the  characteristics  of  the  utterance.  Solution  of  the  problem  then 
rests  on  estimating  the  probability  that  the  particular  observed  value  would 
arise,  given  various  hypotheses  about  what  was  actually  said.  In  the  MINT 
incrementation  of  this  solution,  the  observed  values  must  be  used  as  a  basis 
for  estimating  the  logarithm  of  the  likelihood  ratio;  i.e.,  the  logarithm 
of  the  ratio  of  the  conditional  probability  that  the  observed  value  would 
occur,  given  that  the  potential  recognition  is  a  real  one,  to  the  conditional 
probability  that  the  observed  value  would  occur,  given  the  potential  recog¬ 
nition  is  an  artifact.  (AQ^  as  defined  earlier.) 

The  mechanism  for  converting  an  observed  value  to  an  estimate  of  the  log 
Likelihood  ratio  entails  a  statistical  model;  specifically,  a  pair  of  con¬ 
ditional  distributions  of  the  observable  values,  given  they  are  descriptions 
of  either  real  or  artifactual  recognitions.  These  statistical  models  contain 
distribution  parameters  which  are  estimated  from  Interim  Test  data,  using 
procedures  appropriate  to  the  nature  of  the  data  and  the  statistical  models. 
Recognition  accuracy  and  theoretical  soundness  of  the  MINT  algorithm  both 
require  that  these  statistical  models  and  parameters  must  be  reasonablv  de¬ 
scriptive  of  the  actual  nature  of  speech  data. 

Each  information  source  presents  its  own  difficulties  for  statistical 
modelling,  but  three  issues  can  be  identified  which  are  of  interest  in  assess¬ 
ing  the  validity  of  each  model: 

a.  The  independent  variables  must  be  properly  identified. 

b.  If  a  distribution  shape  has  been  assumed,  it  must  fairly  describe 
the  actual  shape 

c.  The  model  must  describe  statistical  characteristics  which  do 
generalize  from  Interim  Test  data  to  new  speech  data. 

a  priori  MODEL.  The  decision  theoretic  model  of  the  problem  solved  in  MINT 
requires  knowledge  of  the  a  priori  probability  that  a  particular  hypothesis  - 
in  this  context  a  string  of  vocabulary  items  which  potentially  may  have  been 
said  -  will  arise.  ihis  a  priori  probability  is  the  probability  unconditioned 
by  any  observation  about  the  acoustic  data  as  received  by  the  preprocessor  or 
operated  upon  by  MEX,  except  that  the  graph  of  the  utterance  admits  of  the 
string;  i.e.,  contains  a  path  corresponding  to  the  hypothesis. 

It  is  assumed  that  the  probability  that  a  hypothetical  path  is  in  fact  the 
correct  one,  without  consideration  of  any  details  of  the  individual  potential 
recognitions  comprising  the  path ,  or  of  their  mutual  temporal  relationship 
(beyond  that  required  to  make  them  constitute  a  path  through  the  uraph)  depends 
only  on  the  vocabulary  items  in  the  path.  It  is  further  assumed  that  the 
a  priori  probability  of  correctness  of  the  entire  path  is  the  product  of 
probabilities  associated  with  each  vocabulary  item  in  the  path.  Finallv,  it 
is  assumed  that  the  probability  that  a  particular  recognition  of  a  particular 
vocabulary  item  in  a  path  is  in  fact  real  can  be  estimated  from  the  relative 
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frequency  of  occurrence  of  tli.it  vocobularv  item  as  a  real-vice-arti factual 
potential  recognition  in  a  larqe  body  of  speech  data. 

Several  links  in  tins  chain  of  assumptions  are  difficult  or  impossible  to 
justify  on  theoretical  grounds,  or  even  to  test.  In  fact,  the  assumptions  are 
rationali rations  for  the  way  the  a  priori  contribution  to  total  cost  is  actu¬ 
ally  computed  in  MINT,  which  was  chosen  because  it  is  plausible  and  computable 
at  small  cost  in  data  storage  and  processing  burden.  However,  the  evaluation 
of  the  a  priori  information  source  presented  earlier  shows  that  this  procedure 
results  in  a  cost  contribution  which  is  productive  more  often  than  it  is 
counterproductive.  The  whole  chain  of  assumptions  is  thus  justified  in  that 
it  leads  to  a  useful  result.  Two  features  of  the  a  priori  statistical  model 
which  are  amenable  to  test  and  verification  are  its  dependence  upon  vocabulary 
item,  and  stability  of  the  relative  frequency  of  real  and  artifactual  potential 
recognition  for  each  vocabulary  item  type.  To  verify  these  aspects  of  the  model, 
the  relative  frequency  of  occurrence  of  real  and  artifactual  potential  recogni¬ 
tions  for  each  vocabulary  item  are  conyared  for  Interim  Test  data  and  Test  data 
in  Figure  14. 

As  Figure  14  shows,  there  is  considerable  variation  in  the  relative  rate 
of  occurrence  of  artifactual  recognition  for  various  vocabulary  items,  justi¬ 
fying  the  use  of  vocabulary  item  as  an  independent  variable.  More  precisely, 
it  is  the  form  of  the  vocabulary  item  which  is  important  and  which  is  used  as 
the  independent  variable,  in  the  sense  that  souk  vocabulary  items  exist  in  an 
initial  form  and  a  non-initial  form,  and  different  a  priori  statistics  are 
stored  for  the  two  forms. 

These  data  also  show  the  stability  of  the  artifact  production  rates,  indi¬ 
cating  that  rates  estimated  from  Interim  Test  data  remain  valid  for  Test  data, 
thus  presumably  for  all  new  speech  data.  Of  course,  artifact  production  is 
dependent  upon  vocabulary  content  and  frequency  of  occurrence  of  various 
vocabulary  items  in  the  corpus  of  spoken  material.  Since  Test  and  Interim 
test  data  have  identical  vocabularies  and  incidence  of  vocabulary  items,  the 
generalization  from  Interim  Test  data  to  Test  data  is  justified.  However,  if 
LISTEN  were  to  be  used  with  a  set  of  utterances  wherein  each  item  did  not 
occur  a  substantially  equal  fraction  of  the  time,  new  a  priori  statistics 
should  be  derived  from  artifact  occurrence  rates. 

Data  used  in  the  analysis  of  the  a  priori  statistical  model  are  gathered 
using  the  PASS  program  STATSUM  and  printed  using  the  DOGLEG  option  in  LICVAT. 

VIOLATION  CATEGORY  MODEL.  In  the  process  of  detecting  the  potential  presence 
of  a  vocabulary  item  in  the  speech  stream,  MEX  notes  several  types  of  devia¬ 
tions  of  the  speech  from  the  structure  expected  of  that  vocabulary  item. 

Eight  types  of  structural  violation  are  recognized,  and  they  are  described  in 
detail  in  Reference  1.  Each  type  of  structural  violation  is  assigned  a  vio¬ 
lation  category  number,  ranging  from  1  through  R.  Violation  category  0  indi¬ 
cates  that  no  structural  violation  was  detected  by  MFX. 

Violation  category  (0  through  R)  is  regarded  in  MINT  as  an  observed 
characteristic  of  the  potential  recognition,  and  the  probability  of  occur¬ 
rence  of  a  given  violation  cateqory  is  modeled  as  depending  only  upon  violation 
category  and  whether  the  recognition  is  real  or  artifact.  Def'endence  upon 
vocabulary  item  is  suppressed,  primarily  because  very  few  examples  of  some 
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Figure  14.  Artifact  Production  Rates.  (The  number  of  artifactual 
recognitions  divided  by  the  number  of  times  the 
vocabulary  item  was  spoken.) 
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violation  categories  are  observed  for  some  vocabulary  items,  making  it  impos¬ 
sible  to  estimate  reliably  the  rate  of  occurrence  of  these  violations  for 
real  recognitions. 

Vocabulary  Item  Dependence  for  Arti  factual  Reco^im  tions .  The  low  rate  of 
occurrence  of  violations  for  real  recognitions  makes  it  impracti< al  to  esti¬ 
mate  its  vocabulary  item  dependence  with  a  reasonable  sample  size.  However, 
artifactual  recognitions  exhibit  violations  much  more  frequently,  and  the 
vocabulary  item  dependence  of  their  frequency  can  be  estimated.  It  might, 
therefore,  be  both  practical  and  useful  to  use  a  model  of  violation  occurrences 
which  treats  violation  as  independent  of  vocabulary  item  for  real  recognition, 
and  dependent  upon  vocabulary  item  for  artifactual  recognitions. 

To  examine  this  potential  improvement  of  the  violation  category  model, 
the  variation  of  the  frequency  of  occurrence  of  artifactual  violation  cate¬ 
gories  with  vocabulary  item  was  evaluated,  as  shown  in  Figure  15.  To  prepare 
that  figure,  the  conditional  probability  that  a  given  violation  category  would 
occur,  given  that  the  recognition  is  artifactual  and  of  a  given  vocabulary 
item  and  form,  was  estimated  using  the  frequency  of  that  occurrence.  The  maxi¬ 
mum  and  minimum  values  of  the  probabilities  estimated  in  that  way,  over  all 
vocabulary  items,  are  shown  in  the  figure.  The  average  probability  of  occur¬ 
rence  of  each  violation  category  (for  all  artifactual  recognitions),  found  by 
ignoring  the  vocabulary  item  dependence,  is  also  shown  there. 

Hie  data  in  Figure  15,  collected  using  the  DOGLEG  option  in  PASS  program 
LICVAT,  show  that  for  many  violation  categories  there  is  a  significant  vocabu¬ 
lary  item  dependence  in  the  frequency  of  occurrence .  Therefore,  extensions  of 
this  model  to  include  vocabulary  item  as  an  independent  variable  has  definite 
potential  to  increase  the  effectiveness  of  this  information  source. 

Stability .  The  stability  of  the  rate  of  occurrence  of  violation  categories 
was  evaluated  by  comparing  the  frequency  of  occurrence  of  violations  in  Interim 
Test  data  with  their  frequency  in  Test  data.  The  results,  also  collected  using 
the  DOGLEG  option  in  LICVAT,  are  shown  in  Figure  16.  Hie  data  show  that  viola¬ 
tion  occurrence  rates  can  be  estimated  safely  using  Interim  Test  data,  for 
both  real  and  artifactual  recognitions. 

INTRA-WORD  TIMING  MODEL.  During  the  recognition  process,  MEX  notes  the  time 
spent  in  each  state  of  the  recognition  automaton.  A  measure  of  how  typical 
the  loop  state  durations  are  is  accumulated  as  a  linear  combination  of  the 
time  spent  in  each  loop  state.  The  resulting  value  is  denoted  QL,  and  is 
treated  in  MINT  as  an  observation  associated  with  the  potential  recognition. 

QL  is  a  non-negative  number.  The  coefficients  of  the  linear  form  used  in 
computing  QL  are  obtained  from  Training  data,  and  the  computational  procedure 
(forming  the  linear  combination)  is  based  on  a  model  of  the  joint  distribution 
of  the  individual  loop  state  durations,  as  described  in  Reference  1.  MINT 
itself  uses  a  model  of  the  distribution  of  QL  values  which  is  quite  independent 
of  the  model  upon  which  the  computation  of  QL  is  based. 

In  MINT  it  is  assumed  that  QL  is  distributed  exponentially  over  positive 
values,  with  a  mass  concentration  at  zero,  in  a  way  which  depends  upon  the 
vocabulary  item  type  and  form,  and  whether  the  potential  recognition  is  real 
or  artifactual.  The  parameters  of  the  distribution  (the  probability  that  the 
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value  will  be  zero,  and  the  parameter  of  the  exponential  portion  of  the  distri¬ 
bution)  are  estimated  from  the  distribution  of  QL  values  observed  over  Interim 
Test  data. 

If  QL  is  in  fact  distributed  in  the  modified  exponential  manner  assumed, 
the  computation  of  the  log  likelihood  ratio  performed  in  MINT  is  accurate.  It 
is  therefore  of  interest  to  determine  the  validity  of  this  assumption. 


As  the  mass  concentration  at  zero  and  the  parameter  of  the  exponential 
part  of  the  QL  distribution  are  observed  to  be  vocabulary  item  dependent,  it 
is  desirable  to  normalize  observed  QL  distributions  with  respect  to  these 
parameters  in  order  to  avoid  detailed  consideration  of  two  dozen  distributions 
for  each  speaker.  For  this  reason,  the  QL  distributions  have  been  linearized 
as  described  in  the  following  paragraph. 


A  large  set  of  independent  samples  of  a  random  variable,  distributed  as 
QL  is  assumed  to  be  distributed,  can  be  converted  to  a  set  of  numbers  which 
are  approximately  uniformly  distributed  in  the  interval  (0,1).  To  do  this, 
first  put  the  QL  values  in  increasing  order  and  assign  running  index 
i  =  1....N  to  these  values.  For  each  i,  replace  the  1th  QL  value  (QL^)  by 
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-XQL. 


) e 
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if  QL.  =0 

i 


if  QL.  >  0 

l 


where 

Pq  is  the  probability  that  QL  =  0 

X  is  the  parameter  of  the  exponential  portion  of  the  QL  distribution 

If  pQ  and  X  are  in  fact  the  correct  parameters  of  the  QL  distribution, 
and  if  QL  has  the  assumed  distribution  shape,  the  resulting  set  of  numbers 
approach  a  uniform  distribution  on  (0,1)  for  large  N.  Correctness  of  the 
assumed  distribution  shape,  and  of  the  parameters  pQ  and  X  can  then  be  checked 
by  plotting  the  cumulative  distribution  of  the  f^  values.  If  the  distribution 
shape  and  parameters  are  correct,  a  straight  line  will  result.  More  impor¬ 
tantly,  sets  of  QL  values  for  different  vocabulary  items  and  for  real  and  arti- 
factual  recognitions  can  be  converted  to  sets  of  f  values  using  the  parameters 
appropriate  to  each  set,  and  the  sets  of  f  values  can  be  merged.  The  resulting 
large  set  of  data  will  be  uniformly  distributed  on  (0,1)  if  the  model  and  the  pa¬ 
rameters  for  individual  vocabulary  items  and  types  of  recognition  are  correct.  A 
single  graph  thus  checks  the  model  and  parameter  validity  for  the  whole  vocabulary. 

The  PASS  program  STATSUM  performs  the  conversion  of  QL  values  to  f  values 
just  described,  and  the  QLPLOT  function  in  LICVAT  generates  a  computer  graph 
of  the  cumulative  distribution  of  the  merged  f  value  sets.  Results  obtained 
in  this  way  are  presented  in  Figures  17  and  18. 
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LHN  Interim  Test  Data  LHN  Test  Data 

Figure  18.  Computer  Generated  Plots  of  the  Cumulative  Distribution  of  f.  Artifactual  Recognitions 
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The  graph  of  the  cumulative  distribution  for  f  deviates  significantly 
from  a  straight  line  for  speaker  MWG,  but  is  quite  reasonably  straight  for 
speaker  LHN.  This  is  probably  due  to  different  procedures  used  to  generate 
the  exponential  parameters  (X)  for  each  vocabulary  item  for  the  two  speakers. 
For  MWG,  the  X  values  were  estimated  by  visually  comparing  computer  plotted 
cumulative  QL  distributions  with  exponential  curves  of  known  parameter.  The 
distortion  noted  in  the  upper  right  hand  portion  of  MWG' s  f  distributions  are 
what  would  be  expected  if  there  were  a  systematic  bias  towards  estimating  too 
high  a  value  for  X.  The  fact  that  the  lower  left  portions  of  those  curves 
tend  to  be  straight  lines  coincident  with  the  graph  diagonal  indicates  that 
the  pQ  values  are  correct.  They  were  estimated  as  the  fraction  of  observed 
zero  values  and  were  thus  not  subject  to  human  error  as  were  the  X  estimates. 

In  contrast,  both  X  and  pQ  estimates  were  derived  objectively  for  LHN's  voice, 
using  programs  in  the  VDGS. 

The  difference  just  noted  between  the  two  speakers'  data  suggests  that 
the  QL  information  source  could  be  improved  for  MWG  by  re-estimating  the  X 
parameters  for  each  vocabulary  item,  using  the  unbiased  mathematical  procedure. 

These  graphs  indicate  that  the  assumed  exponential  shape  fits  the  distri¬ 
bution  of  non-zero  QL  values  quite  well.  If  the  data  were  distributed  in  some 
other  way,  with  the  parameter  X  chosen  to  obtain  best  fit  to  the  data,  the 
curves  would  have  an  ogival  shape  rising  above  and  falling  below  the  graph 
diagonal  in  the  upper  right  hand  portion  of  the  graph.  The  graphs  also  indicate 
that  the  parameters  obtained  from  Interim  Test  data  are  descriptive  of  other 
speech  data,  as  indicated  by  the  similarity  of  the  curves  for  Interim  Test  and 
Test  data.  Thus  the  QL  statistical  model  appears  stable. 

ASSOCIATION  MODEL.  In  an  effort  to  exploit  the  fact  that  speaking  certain 
vocabulary  items  may  have  a  tendency  to  cause  arti factual  recognition  of 
another  vocabulary  item,  MINT  detects  and  uses  the  temporal  association  of 
potential  recognitions.  If  there  is  significant  asymmetry  in  the  rates  of 
artifact  production  (for  example,  if  speaking  "five"  usually  causes  artifactual 
recognition  of  "nine,"  while  speaking  "nine"  seldom  produces  artifactual  recog¬ 
nition  of  "five")  association  may  carry  information  useful  for  recognition. 

A  set  of  associated  vocabulary  items  and  forms  is  ascribed  to  each  poten¬ 
tial  recognition  for  this  purpose.  A  vocabulary  item  is  associated  with  a 
given  potential  recognition  if  there  is  another  potential  recognition  of  that 
vocabulary  item  type  which  overlaps  sufficiently  in  time.  The  required  amount 
of  overlap  (called  the  association  criterion)  is  determined  as  described  in 
Reference  2.  Only  the  existence  or  non-existence  of  associated  recognitions 
of  each  vocabulary  item  is  noted,  not  their  number. 

The  probability  that  a  potential  recognition  will  have  an  associated 
potential  recognition  of  given  vocabulary  type  is  assumed  to  depend  upon  both 
vocabulary  items  and  forms,  and  whether  the  former  recognition  is  real  or 
artifactual. 

The  PASS  program  STATSUM  tallies  the  number  of  times  each  vocabulary  item 
and  form  is  found  to  be  associated  with  real  and  artifactual  recognitions  of 
each  vocabulary  item  and  form  and  the  LICVAT  option  DOGLEG  prints  these  data. 
The  probability  that  a  real  (or  artifactual)  recognition  of  given  vocabulary 
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item  and  form  will  have  an  associated  recognition  of  given  vocabulary  item  and 
form  can  be  estimated  directly  from  these  tallies.  The  association  data  can 
contribute  to  correct  recognition  when  the  probabilities  for  real  and  arti- 
factual  recognition  differ  significantly.  The  natural  measure  of  this  poten¬ 
tial  is  the  likelihood  ratio. 

The  assumed  dependence  upon  vocabulary  item  of  the  associated  recognition 
is  shown  to  be  factual  by  the  data  in  Figure  19.  These  data  show  the  esti¬ 
mated  probability  that  various  vocabulary  items  and  forms  will  be  associated 
with  real  and  artifactual  recognitions  of  the  word  "five,"  as  determined  for 
MWG  test  data.  The  likelihood  ratio  is  seen  to  vary  widely  from  unity.  How¬ 
ever,  if  one  averages  over  vocabulary  items,  it  is  found  that  both  real  and 
artifactual  fives  have  the  same  probability  (.21)  of  having  an  associated 
recognition  of  unspecified  type,  resulting  in  a  likelihood  ratio  of  one,  and 
no  information  for  distinguishing  real  from  artifactual  recognitions.  Similar 
results  can  be  demonstrated  for  vocabulary  items  other  than  "five."  Including 
vocabulary  item  dependence  in  the  association  model  is,  therefore,  necessary 
t  in  order  to  extract  the  available  information. 

The  data  of  Figure  19  also  show  that  association  of  a  potential  recog¬ 
nition  of  "five"  with  another  recognition  of  any  vocabulary  item  other  than 
"nine"  yields  information  useful  in  distinguishing  real  from  artifactual  rec¬ 
ognitions.  The  exception  is  unfortunate,  as  " five'V'nine"  discrimination  is 
difficult. 

The  stability  of  association  statistics  can  be  demonstrated  by  comparing 
association  frequencies  observed  for  Interim  test  and  Test  data.  These  fre¬ 
quencies  are  shown  in  Figure  20  for  LHN's  enunciations  of  "point."  The  fre¬ 
quency  observed  in  Test  data  is  plotted  against  the  frequency  observed  in 
Interim  Test  data  to  facilitate  comparison. 

INTERWORD  TIMING  MODELS.  MINT  uses  three  different  models  related  to  the 
relative  time  of  occurrence  of  potential  recognitions  within  an  utterance. 

These  three  models  treat  the  delay  between  the  start  of  the  utterance  (sound 
detected  by  the  preprocessor)  and  the  beginning  of  the  first  recognition,  the 
gap  or  overlap  between  successive  words  of  the  utterance,  and  the  delay  between 
the  recognition  of  the  last  word  of  the  utterance  and  the  cessation  of  sound. 

Initial  Delay  Model.  The  delay  between  the  beginning  of  the  utterance  and  the 
start  time  of  the  recognition  of  the  first  word  in  the  utterance  is  assumed  to 
be  distributed  exponentially  over  positive  values,  with  a  mass  concentration 
at  zero.  (This  distribution  was  suggested  by  examining  many  cases  during 
LISTEN's  development.)  The  probability  of  a  zero  value  and  the  parameter  of 
the  exponential  portion  of  the  distribution  are  assumed  to  be  dependent  upon 
vocabulary  item  and  whether  the  recognition  is  really  the  first  word  spoken  in 
the  utterance  or  not.  (Thus  for  the  initial  delay  model,  recognition  of  the 
second  word  actually  spoken  is  an  artifactual  recognition  of  the  first  word 
spoken. ) 

Variation  of  the  distribution  parameters  with  vocabulary  item,  and  sta¬ 
bility  of  these  statistics,  are  revealed  by  comparing  estimates  of  the  param¬ 
eters  derived  from  Interim  Test  data  with  estimates  taken  from  Test  data.  Data 
for  computing  these  estimates  are  provided  by  the  PASS  program  STATSUM,  and 
printed  by  the  GAP  DATA  option  in  LICVAT. 
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Frequency  of  Association  Observed  in  Interim  Test  Data 

Figure  20.  Comparison  of  the  Frequency  with  which  Various  Vocabulary  Items 
Are  Associated  with  Recognition  of  the  Word  "Point"  in  Interim 
Test  and  Test  Data  for  Speaker  LHN. 
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Figure  21  shows  the  frequency  with  which  zero  initial  delay  was  observed 
for  all  vocabulary  items,  in  Interim  Test  data  and  Test  data.  The  wide  vari¬ 
ation  of  these  ratios  for  various  vocabulary  items  indicates  the  importance 
of  using  vocabulary  item  as  an  independent  variable.  This  figure  also  shows 
that  the  variation  in  rate  of  occurrence  between  vocabulary  items  is  greater 
than  the  variation  from  Interim  Test  to  Test  data,  further  validating  the 
dependence  upon  vocabulary  item,  and  showing  the  stability  of  the  statistics. 
The  wide  disparity  in  frequency  of  zero  delay  observed  for  real  and  artifactual 
recognition,  and  hence  the  utility  of  this  information  source,  is  also  ap¬ 
parent  in  this  figure. 

Figure  22  shows  the  mean  of  the  non-zero  initial  delay  observed  in  Interim 
Test  and  Test  data  for  all  vocabulary  items.  The  reciprocal  of  this  value  is 
an  unbiased  estimator  of  the  exponential  distribution  parameter.  The  time 
unit  is  one  "count,"  the  period  of  the  interrupt  signal  from  the  preprocessor, 
which  is  approximately  two  milliseconds.  These  data  indicate  several  inter¬ 
esting  characteristics  of  the  non-zero  initial  delay  distributions. 

First,  non-zero  initial  delays  are  very  much  larger  for  artifactual  than 
for  real  recognitions.  The  only  exceptions  to  this  rule  are  vocabulary  items 
of  initial  form;  for  those  vocabulary  items,  the  non-zero  initial  delays  for 
real  and  artifactual  recognitions  are  comparable.  (This  is  because  potential 
recognition  of  the  initial  form  of  a  vocabulary  item  is  only  allowed  by  MEX 
to  start  in  the  first  fifty  or  so  milliseconds  of  the  utterance.)  If  the 
distribution  of  non-zero  initial  delays  is  in  fact  exponential,  this  indicates 
that  artifact  non-zero  initial  delays  are  distributed  essentially  uniformly 
in  the  interval  where  there  is  any  reasonable  probability  of  a  delay  being 
due  to  a  real  recognition. 

Second,  among  non-initial  artifactual  vocabulary  items,  the  variation  of 
mean  non-zero  initial  delay  with  vocabulary  item  is  not  a  large  fraction  of 
the  average  value,  and  comparable  to  the  variability  between  Interim  Test  and 
Test  data.  Combining  this  fact  with  the  first  observation,  it  appears  that 
the  initial  delay  model  could  be  simplified  by  assuming  non-initial  delays  for 
artifactual  recognitions  are  distributed  uniformly  over  the  region  of  interest, 
with  a  density  which  is  independent  of  vocabulary  items.  From  a  computational 
point  of  view,  however,  it  turns  out  to  be  simpler  to  retain  the  assumption 
that  the  distribution  is  exponential  rather  than  uniform,  but  with  a  distri¬ 
bution  parameter  which  is  independent  of  vocabulary  item. 

Third,  the  stability  of  the  non-zero  initial  delay  distribution  for  real 
recognition  and  artifactual  recognition  of  initial  form  is  suspect.  This  is 
almost  certainly  a  problem  of  sample  size,  as  several  vocabulary  items  have 
high  probability  of  zero  initial  delay,  leading  to  very  few  cases  of  non-zero 
delay  from  which  to  estimate  the  mean.  For  example,  in  the  corpus  of  utter¬ 
ances  used  in  this  project,  each  data  set  (Training,  Interim  Test  and  Test) 
contains  thirty  occurrences  of  each  vocabulary  item  in  the  initial  position 
(including  the  six  cases  where  the  item  is  spoken  in  isolation) .  If  the 


probability  of  zero  initial  delay  is  0.8,  the  expected  non-zero  delay  sample 
size  is  six.  An  extended  study  in  this  area  might  reveal  an  appropriate 
simplification  of  this  portion  of  the  initial  delay  model  as  well. 
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Mean  of  Non-zero  Initial  Delays  Observed  in 
Interim  Test  Data  (Counts) .  Speaker  MWG. 
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Mean  of  Non-zero  Initial  Delays  Observed  in 
Interim  Test  Data  (Counts) .  Speaker  LHN . 


Figure  22.  Graphs  Showing  Mean  of  Non-zero  Initial  Delay 
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Final  Delay  Model.  The  interval  between  the  time  of  recognition  of  the  last 
word  spoken  in  an  utterance  and  the  preprocessor's  detection  of  the  cessation 
of  speech  is  assumed  to  be  distributed  exponentially,  with  distribution  param¬ 
eter  depending  upon  vocabulary  items  and  whether  or  not  a  potential  recognition 
is  really  the  last  word  spoken.  (Thus  "artifactual  last  words"  include  all 
artifactual  recognitions  and  all  real  recognitions  of  words  other  than  the 
last.)  The  PASS  program  STATSUM  accumulates  end  delay  values  and  averages 
those  data  for  each  vocabulary  item  and  recognition  type  and  the  GAP  DATA 
option  of  LICVAT  prints  the  results.  These  data  are  shown  in  Figure  23  for 
all  vocabulary  items.  (The  reciprocal  of  the  average  delay  is  an  unbiased 
estimator  of  the  exponential  distribution  parameter.)  Two  different  scale 
factors  have  been  used  in  this  figure  to  increase  visibility  of  certain  fea¬ 
tures  of  the  data.  The  unit  of  time  used  is  one  "count",  about  two  milli¬ 
seconds  . 


These  data  show  that  the  final  delay  has  significant  vocabulary  item 
dependence,  and  that  the  variation  with  vocabulary  items  is  considerably 
larger  than  the  variation  from  Interim  Test  data  to  Test  data,  for  both  real 
and  artifactual  recognitions.  Therefore,  unlike  the  initial  delay  model,  the 
final  delay  model  cannot  be  simplified  by  suppressing  vocabulary  item  depend¬ 
ence  without  sacrificing  information. 


Interword  Gap  Model.  The  time  interval  between  the  end  (recognition  time)  of 
one  potential  recognition  and  the  beginning  (start  time)  of  another  is  assumed 
to  be  distributed  in  a  symmetric  limited  exponential  manner.  That  is,  the 
probability  density,  as  a  function  of  the  interword  gap  g  is  assumed  to  be  of 
the  form: 


p(g) 


_1_ 

4d 


if  |g-u |  <  d 
if  |g-y  |  >  d 


where  y  and  d  are  parameters  of  the  distribution.  These  parameters  are  assumed 
to  depend  upon  the  vocabulary  item  and  form  of  the  two  potential  recognitions, 
and  on  whether  they  are  really  recognitions  of  contiguous  spoken  words  taken 
in  correct  order,  or  otherwise.  The  time  interval  between  two  potential  recog¬ 
nitions  is  thus  considered  artifactual  if  the  first  is  treated  in  MINT  as  a 
potential  predecessor  of  the  second,  but  they  are  not  both  real  recognitions 
of  contiguously  spoken  words. 

With  a  dozen  vocabulary  items,  this  model  requires  considering  a  gross  of 
vocabulary  item  pairs.  Since  each  Magic  Number  Set  of  (55)  utterances  con¬ 
tains  each  sequential  pair  of  vocabulary  items  exactly  once,  ("point-point" 
was  excluded) .  Training,  Interim  Test  and  Test  data  sets  contain  six  examples 
of  each  interword  gap  distinguished  by  the  model.  Statistical  sample  size  is 
thus  a  serious  problem  in  estimating  the  distribution  parameters  y  and  d  for 
each  pair  of  vocabulary  items. 
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The  small  number  of  available  interword  gap  samples  also  makes  it  diffi¬ 
cult  to  validate  treating  vocabulary  items  as  an  independent  variable  in  the 
interword  gap  model.  Some  justification  for  considering  vocabulary  items  in 
modelling  real  interword  gaps  can  be  taken  from  the  fact  that  data  tend  to 
have  certain  trends  which  would  be  expected  on  a  phonological  basis.  For 
example,  in  the  "six-six"  case,  one  expects  overlap  due  to  the  identical  termi¬ 
nal  and  initial  sound  of  the  words  involved.  Similarly,  one  expects  overlap 
of  word  pairs  which  share  a  stop,  such  as  "eight-two."  Word  pairs  which  entail 
dissimilar  sounds  at  their  juncture,  such  as  "seven-point"  are  expected  to 
have  larger  than  average  interword  gaps.  Hie  observed  mean  values  of  inter¬ 
word  gaps,  subjectively  at  least,  seem  to  exhibit  many  of  these  anticipated 
tendencies,  as  is  demonstrated  in  Figure  24.  These  data  were  obtained  by  the 
VDGS  program  GAPSTER,  for  speaker  MWG's  Interim  Test  data. 

It  is  much  less  likely  that  vocabulary  item  dependence  should  be  consid¬ 
ered  in  the  distribution  of  artifact  interword  gaps,  since  the  phonological 
argument  cannot  be  applied  to  artifactual  recognitions  or  non-contiguous  real 
recognitions.  Little  would  probably  be  lost  by  simplifying  the  model  by  sup¬ 
pressing  this  dependence,  but  it  is  impossible  to  demonstrate  that  as  fact 
with  available  data. 

In  an  attempt  to  evaluate  the  stability  of  interword  gap  statistics  and 
validity  of  the  assumed  distribution  shape,  the  following  procedure  was  used 
to  normalize  interword  gap  data.  A  derived  random  variable,  f,  can  be  com¬ 
puted  from  the  observed  random  gap  values,  g,  using  the  known  parameters  of 
the  distribution.  If  f  is  related  to  g  by 


f  (g) 


dx 


where  p(x)  is  the  probability  density  assumed  for  the  gap  data,  f  will  be  uni¬ 
formly  distributed  on  (0,1)  provided  the  assumed  distribution  is  correct.  By 
using  distribution  parameters  u  and  d  appropriate  to  the  instance  of  g,  all 
f  values  thus  found  can  be  merged  and  their  cumulative  distribution  plotted. 

If  the  assumed  distribution  shape  and  parameters  are  correct,  a  straight  line 
will  result. 

The  PASS  program  STATSUM  computes  the  normalizing  function  f  and  the 
option  QGPLOT  of  the  program  LICVAT  generates  a  computer  plot  of  the  cumulative 
distribution  of  f.  Graphs  taken  from  the  computer  plots  are  presented  in 
Figures  25  and  26.  Salient  features  of  these  data  are  discussed  in  the  follow¬ 
ing  paragraphs . 

The  graphs  for  Interim  Test  data  show  that  f  is  not  uniformly  distributed 
in  those  data,  even  though  the  distribution  parameters  used  in  computing  f  are 
taken  from  those  data.  The  shape  of  the  curves  can  be  explained  by  the  fact 
that  the  procedure  used  to  estimate  the  distribution  parameters  incorporates 
a  bias  toward  over-estimating  the  distribution  parameter  d  when  small  numbers 
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of  samples  are  available.  As  a  result,  the  parameters  for  several  vocabulary 
item  pairs  describe  a  distribution  which  is  wider  than  the  data  indicate, 
leading  to  fewer  than  expected  f  values  near  zero  and  one.  Another  factor 
which  may  be  contributing  to  the  paucity  of  real  gap  cases  at  the  extremes 
would  be  that  the  distribution  shape  has  too  much  weight  in  the  exponential 
portions,  vice  the  uniform  portion.  (The  adopted  distribution  has  twenty-five 
percent  of  its  mass  in  each  exponential  segment.)  No  reasonable  explanation 
is  available  for  the  greater  deviation  of  the  f  distribution  from  linearity 
observed  for  speaker  MWG  than  for  speaker  LHN. 


The  distribution  of  real  gaps  is  very  stable  with  respect  to  speech 
sanple,  as  can  be  seen  by  comparing  the  f  distribution  for  real  gaps  shown  for 
Interim  Test  and  Tost  data.  This  stability,  and  the  similarity  of  the  dis¬ 
tributions  obtained  for  the  two  speakers,  suggests  that  substantial  improve¬ 
ment  in  the  modelling  of  real  gaps  can  be  obtained  by  reducing  the  small 
sample  protection  bias  toward  large  d  values,  and  perhaps  changing  the  assumed 
distribution  shape  by  reducing  the  mass  in  the  exponential  portions. 

The  f  distribution  graphs  for  artifact  gaps  show  a  deviation  from  linear¬ 
ity  which  is  the  reverse  of  that  observed  for  real  gaps.  In  each  case,  more 
than  the  expected  number  of  f  values  are  found  near  zero  and  one,  and  fewer 
near  middle  values.  This  is  the  result  to  be  expected  when  the  gaps  are 
actually  distributed  more  or  less  uniformly  over  a  broad  interval,  including 
values  where  the  density  is  modelled  as  decreasing  exponentially.  It  is  a 
clear  indication  that  the  assumed  distribution  shape  is  not  appropriate  for 
artifact  gaps.  As  the  distribution  width  (indicated  by  the  parameter  d)  is 
much  greater  for  artifacts  than  for  real  gaps,  a  superior  model  for  artifact 
gaps  would  result  from  assuming  that  artifact  gaps  arc  uniformly  distributed 
over  an  interval  containing  almost  all  real  gaps.  The  almost  linear  portion 
of  the  f  distribution  near  middle  f  corresponds  to  time  values  covering  the 
region  of  interest  for  real  gaps,  so  this  linearity  indicates  that  the  locally 
uniform  assumption  is  a  good  one. 


Stability  of  the  gap  statistics  is  also  indicated  by  the  similarity  of 
the  artifact  f  distribution  for  Interim  Test  and  Tost  data.  This  is  another 
indication  that  improvement  in  the  artifact  gap  model  may  significantly  improve 
use  of  the  gap  information  source. 

This  analysis  reveals  a  tendency  to  underestimate  real  gap  densities,  and 
overestimate  artifact  gap  densities,  at  middle  f  values.  This  results  in  a 
considerable  underestimation  of  the  likelihood  ratio,  and  too  little  cost 
advantage  being  assigned  for  gaps  observed  in  this  region.  For  extreme  f 
values,  the  density  of  real  gaps  is  overestimated  and  the  density  of  artifact 
gaps  is  underestimated,  leading  to  overestimation  of  the  likelihood  ratio. 

As  a  result,  extremely  short  or  long  gaps  are  not  penalized  by  high  cost  to 
the  extent  they  should  be.  The  net  effect  ot  these  model  inadequacies  is  to 
underemphasize  the  gap  information  source  by  assigning  costs  which  partially 
mask  the  true  significance  of  typical  and  atypical  gaps  alike.  This  is  a  very 
interesting  result  in  view  of  the  fact  that  gap  data  are  an  important  part  of 
the  interword  timing  information  source,  and  this  information  source  has  been 
found  to  be  the  most  productive  information  source  used  in  LISTEN.  Improve¬ 
ment  of  the  gap  model  would  then  seem  to  offer  significant  potential  for 
improving  LISTEN'S  performance. 
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SECTION  V 

SUMMARY  01-  RESULTS  AND  CONCLUSIONS 


SUMMARY  OF  RESULTS 

The  VIAS  project  is  a  continuation  of  the  NAVTRAEQUIPCEN ' s  exploratory 
development  program  for  automated  speech  technology.  It  has  contributed  to 
that  program  by  developing  a  working  system  suitable  for  laboratory  concept 
development  in  the  area  of  limited  connected  speech  recognition  which  is 
readily  modified  for  research  purposes.  This  system  permits  the  variation  of 
parameters  and  evaluation  and  analysis  of  effects  upon  recognition  results. 
Consideration  has  been  given  to  increasing  the  number  of  speakers,  automating 
the  process  of  reference  pattern  creation,  expanding  vocabulary  size,  and 
transferring  technology  to  a  new  preprocessor,  all  within  the  context  of  real¬ 
time  recognition. 

Specific  results  achieved  by  this  project  are  summarized  below. 

TRANSFER  OF  TECHNOLOGY .  The  real-time-connected  speech  recognition  system 
LISTEN  has  been  modified  to  operate  successfully  with  a  new  model  of  speech 
preprocessor . 

EXTENSION  TO  NEW  SPEAKERS.  It  has  been  demonstrated  that  LISTEN  can  achieve 
connected  speech  recognition  accuracies  in  excess  of  ninety  percent  (word 
basis)  for  a  new  speaker. 

EXAMPLE  SET  GENERATION.  The  importance  of  the  method  of  generating  sets  of 
individual  vocabulary  items  used  in  creating  voice  reference  data  has  been 
demonstrated . 

VOICE  DATA  GENERATION  SYSTEM  (VDGS) .  A  unified  body  of  computer  programs  for 
generating  voice  refi  rence  data  has  been  developed.  These  programs  automate 
the  voice  reference  data  creation  process  to  the  full  extent  practicable  at 
this  time.  These  programs  exist  in  two  forms:  as  an  almost  autonomous  se¬ 
quence  of  programs  requiring  an  absolute  minimum  of  human  intervention,  and 
as  a  collection  of  individual  programs  which  can  be  exercised  independently 
for  research  purposes  A  detailed  users  manual  has  been  provided  for  both 
versions  of  thia  system  of  programming. 

PERFORMANCE  ANALYSIS  SUBSYSTEM  (PASS).  A  useful,  powerful,  and  convenient  set 
of  programs  has  been  developed  and  exercised  for  analyzing  the  overall  per¬ 
formance  and  many  technical  details  of  LISTEN'S  operation.  A  users  manual 
also  has  been  provided  for  using  these  programs. 

VOCABULARY  EXPANSION.  The  number  of  vocabulary  items  which  can  be  accommodated 
by  various  VDGS  programs  has  been  increased  toward  the  desired  goal  of  thirty. 
The  goal  has  been  reached  for  several  of  those  programs,  and  no  fundamental 
barrier  exists  to  reaching  it  for  all  programs. 
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ANALYSES  OF  LISTEN  PERFORMANCE.  Programs  of  the  PASS  have  been  used  to  analyze 
the  significance  of  the  several  information  sources  LISTEN  uses  to  obtain  rec¬ 
ognition.  It  has  been  found  that  these  information  sources  vary  considerably 
in  their  utility  for  recognition.  Methods  of  automatically  classifying  and 
analyzing  recognition  errors  have  been  developed  and  used.  Among  the  many  find¬ 
ings,  it  has  been  shown  that  most  recognition  errors  result  from  failure  to 
correctly  select  the  correct  alternative  in  a  simple  substitution  decision. 

The  statistical  models  used  to  represent  the  information  sources  have  been 
examined  critically  with  a  variety  of  results.  While  the  models  have  generally 
been  shown  to  be  effective,  several  specific  modifications  to  simplify  data 
collection  or  improve  model  fidelity  (and  recognition  accuracy)  have  been 
suggested. 

CONCLUSIONS 

Results  obtained  in  the  VIAS  project  support  four  conclusions  of  general 
interest,  as  discussed  in  (he  following  paragraphs. 

MAGNITUDE  OF  THE  VOICE  REFERENCE  DATA  GENERATION  BURDEN.  Producing  the  VDGS 
was  a  major  task,  due  to  the  number  and  complexity  of  the  procedures  used  to 
produce  voice  reference  data  needed  by  the  LISTEN  real-time  recognition  pro¬ 
grams.  Using  the  VDGS  to  produce  voice  reference  data  for  new  speakers  also 
requires  a  considerable  amount  of  computer  time  and  labor.  These  facts  have 
made  clear  the  important  role  that  reference  data  generation  requirements  may 
have  in  determining  the  practicality  of  applying  a  connected  speech  recognition 
capability  in  a  training  environment. 

LISTEN  was  developed  with  primary  emphasis  on  real-time  operation  and  ex¬ 
ploitation  of  all  information  which  might  be  present  in  the  preprocessor  out¬ 
put,  and  essentially  no  concern  with  the  voice  reference  data  production 
burden.  Now  that  much  has  been  learned  about  the  nature  of  the  information 
present  in  the  preprocessor  output,  the  opportunity  exists  to  reformulate  the 
recognition  and  reference  data  extraction  processes  in  a  way  which  will  main¬ 
tain  or  improve  recognition  performance  while  minimizing  the  reference  data 
production  burden. 

INFORMATION  IN  THE  PREPROCESSOR  OUTPUT.  Analyses  performed  using  the  PASS  pro¬ 
grams  have  verified  the  presence,  and  elucidated  the  nature,  of  information 
sources  in  the  preprocessor  output.  Models  of  those  sources  posited  during 
LISTEN's  development  have  been  validated  to  varying  degrees,  but  the  validity 
of  the  models  is  secondary  in  significance  to  the  fact  that  those  information 
sources  have  been  isolated  and  demonstrated  by  objective  means  to  be  present 
and  to  have  utility  for  recognizing  connected  speech. 

THE  ANALYTIC  APPROACH.  Equally  significant  is  the  fact  that  the  approach  used 
in  this  project  to  evalute  LISTEN's  performance  has  led  to  analytic  procedures 
which  reveal  the  character  and  relative  value  of  different  sources  of  informa¬ 
tion  in  a  preprocessor's  output.  This  approach  is  data  intensive  and  costly 
in  terms  of  computer  processing  requirements  for  developing  and  exercising  the 
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APPENDIX  A 

VOICE  DATA  GENERATION  SYSTEM  USERS  MANUAL 

A. 1  GENERAL 

The  object  of  this  appendix  is  to  describe  in  some  detail  the  use  of 
the  Voice  Data  Generation  System  (VDGS).  Specifically  discussed  will  be 
what  is  involved  in  the  process  which  begins  with  the  extraction  of  voice 
samples  from  the  speaker  and  ends  with  the  creation  of  a  MIND  file.  The 
individual  programs  of  VDGS  also  will  be  described.  The  main  body  of  this 
appendix  describes  two  methods  of  using  the  VDGS  programs  to  prepare  the 
MIND  file  which  is  necessary  for  the  operation  of  LISTEN. 

The  operating  environment  for  which  the  VDGS  software  has  been  pre¬ 
pared  is  one  in  which  there  is  available  a  Data  General  S-130  minicomputer 
running  under  RDOS  with  a  Threshold  TTI-500  voice  preprocessor  and  stand¬ 
ard  peripheral  devices.  The  executable  files  for  each  of  the  individual 
VDGS  routines  are  intended  to  function  on  the  S-130.  However,  the  VDGS 
software  also  will  operate  on  a  Nova  3  minicomputer,  provided  all  routines 
are  recompiled  and  all  programs  reloaded. 

Program  descriptions  for  VDGS  routines  are  presented  in  A. 8, 

File  descriptions  for  VDGS  user-created  files  are  presented  in  A. 9. 

Data  files  and  compile  and  load  macros  are  tabulated  in  A.  10. 

A.  2  THE  TOO  METHODS  OF  VDGS 

Before  LISTEN  can  perform  limited  continuous  speech  recognition  of  a 
given  speaker's  voice  it  is  necessary  to  construct  a  MIND  file.  A  MIND 
file  ij  a  file  containing  the  concentrated  statistical  essence  of  a  voice, 
and  many  routines  (twenty-four)  are  required  to  create  it.  We  will  des¬ 
cribe  two  different  methods  for  using  these  twenty-four  routines  to  create 
the  MIND  file.  Cue  method,  which  we  will  refer  to  as  the  chain  method,  is 
to  use  one  routine  and  two  command  macros  to  execute  all  twenty— four  rou¬ 
tines  with  operator  intervention  required  at  only  one  point.  The  other 
approach,  which  we  w*  11  call  the  step-through  method,  requires  an  operator 
to  execute  each  program  separately  and  engage  interactively  with  the  prog¬ 
rams.  The  chain  method  has  some  limitations  which  will  be  described 
later,  but  it  essentially  runs  by  itself.  The  step-through  method  is  more 
flexible,  but  it  requires  relatively  extensive  operator  input.  Whichever 
method  is  chosen,  it  must  be  followed  through  to  the  creation  of  the  MIND 
file.  In  the  sequel  we  will  describe  both  methods  for  using  the  LCSR  sta¬ 
tistical  preprocessing  package. 

A.  3  THE  VDGS  CHAIN 

INTRODUCTION  TO  CHAINMIND 

The  approach  to  using  the  VDGS  software  which  is  simplest,  in  the 
sense  of  requiring  the  least  input  from  the  operator,  is  embodied  in 
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CHAINMIND  -  the  VDGS  chain.  CHAINMIND  has  three  parts:  (1)  the  extrac¬ 

tion  and  compression  of  voice  samples,  (2)  the  creation  of  example  spaces 
and  transition  letter  sets,  and  then  (3)  all  the  rest  of  statistics  gath¬ 
ering  and  statistical  processing,  including  the  building  of  the  MIND  file. 

The  first  part  of  CHAINMIND  is  accomplished  by  the  program  EXTRACT 
which  prompts  the  user  to  speak,  extracts  raw  voice  data  from  the  TTI-500 
preprocessor  and  compresses  the  data  to  a  form  usable  by  the  remaining 
VDGS  routines.  The  operation  of  EXTRACT  is  described  later  and  some  fur¬ 
ther  comments  about  its  use  are  included  in  A. 4. 

The  second  part  of  CHAINMIND  is  GENTL,  a  small  chain  consisting  of 
the  programs  ESG  and  GZEC.  ESG  creates  eleven  example  spaces,  one  for 
each  of  eleven  vocabulary  items;  GZEC  creates  transition  letter  sets,  also 
one  for  each  item.  The  operation  of  GENTL  is  also  described  in  detail 
later . 
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The  third  part  of  CHAINMIND  is  MAKEMIND,  a  chain  of  the  remaining  21 
VDGS  programs. 

USE  OF  CHAINMIND 

To  begin  CHAINMIND,  first  use  EXTRACT  to  create  compressed  data  files 
for  all  utterances  in  eighteen  magic  numbers  sets,  MNSETA  through  MNSETR. 
This  can  be  done  over  a  period  of  time  at  the  user's  convenience.  Probably 
not  more  than  six  magic  number  sets  at  the  most  (310  utterances)  should  be 
spoken  at  a  sitting. 

After  all  eighteen  magic  number  sets  of  utterances  have  been  spoken, 
the  chain  GENTL  can  be  run.  To  do  this  make  sure  that  the  files  ZESG.SV 
and  ZGZEC.SV  are  on  the  speaker's  directory  (this  directory  should  have  a 
three  letter  name,  and  should  hold  all  the  compressed  data  files)  as  well 
as  the  data  files  PFILE  and  WIZ.ST  and  the  command  file  GENTL.  Having 
done  that,  type  @GENTL@,  and  the  example  spaces  ES$XXX$**  and  temporary 
transition  letter  set  files  TRLS** . TM  and  TRIX**.TM  will  be  created. 

When  GENTL  is  finished,  the  user  must  intervene  to  pick  the  best 
transition  letter  set  for  each  item.  The  procedure  for  choosing  the  best 
transition  letter  sets  is  described  later  when  the  program  RESCUE  is  dis¬ 
cussed.  When  the  best  transition  letter  sets  have  been  determined  and 
their  "RESCUE  indices"  found,  the  user  should  create  a  file  called  REDEEM 
with  the  editor.  The  user  then  should  enter  into  the  REDEEM  file  the 
eleven  RESCUE  indices,  in  order  from  the  first  vocabulary  item  to  the 
eleventh,  one  per  line,  in  12  format. 

Once  the  file  REDEEM  has  been  created,  the  third  section  of  CHAINMIND 
can  be  run.  Again,  all  the  MAKEMIND  executable  files  must  exist  on  the 
speaker's  subdirectory,  together  with  all  the  compressed  data  files,  the 
magic  number  set  files,  and  the  file  REDEEM.  Then  the  user  must  create  a 
file  called  WHERE  with  a  single  entry  of  the  form  "disk  unit:  subdirec¬ 
tory"  indicating  where  the  speaker's  counter  data  files  will  reside  -  e.g. 
DP2: USG.  To  continue,  type  @MAKEMIND@,  and  (after  20-25  hours)  the 
MIND.VD  file  is  created. 
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In  summary,  the  voice  technician  supervising  the  operation  of 
CHAINMIND  should  proceed  like  this: 

a.  Make  sure  all  the  CHAINMIND  routines  and  data  files  exist  on  the 
speaker's  subdirectory.  (See  Table  A1) 

b.  Run  EXTRACT  to  create  the  compressed  data  files 

c.  Run  @GENTL@  to  create  example  spaces  and  transition  letter  sets 

d.  Pick  the  best  transition  letter  sets  and  create  the  REDEEM  file. 

Also  create  the  WHERE  file. 

e.  Run  @MAKEMIND@  to  execute  the  remaining  VDGS  routines  and  create 
the  MIND  file. 

SOME  COMMENTS  AND  CAVEATS 

The  CHAINMIND  method  of  VDGS  processing  has  some  rigidities  and 
limitations  which  must  be  pointed  out. 

a.  CHAINMIND  is  limited  to  the  use  of  eleven  machines  and  does  not 
allow  the  option  of  creating  universal  machines  for  the  special  handling 
of  initial  words  in  an  utterance. 

b.  The  magic  number  sets  to  be  used  for  training,  interim  test,  and 
test  data  are  fixed  in  CHAINMIND.  The  sets  MNSETA  through  MNSETF  are  used 
for  training  data,  MNSETG  through  MNSETL  for  interim  test  data,  and  MNSETM 
through  MNSETR  for  final  test  data. 

c.  Some  examples  in  the  ESG-created  example  spaces  may  be  too  long 
for  processing  by  GZEC  and  LOOPER,  and  these  examples  will  be  ignored. 

d.  There  is  no  way  for  the  user  to  intervene  and  remove  special  bad 
cases  in  the  counter  data  file  CDAT.RV  created  by  REVEXA  and  REVEX.  This 
mostly  has  the  effect  of  increasing  the  false  alarm  rate  later  on  in  the 
other  programs. 

e.  Perh’"’  most  significantly,  the  CHAINMIND  method  has  a  poorer 
facility  for  recovering  from  abnormal  or  error  situations  than  the  step- 
by-step  approach.  This  means  that  there  are  abnormal  situations  with 
which  CHAINMIND  cannot  cope  and  will  crash. 

A. 4  THE  VDGS  STAND-ALONE  VERSION 

The  other  method  of  using  the  VDGS  programs  is  a  step-by-step  inter¬ 
active  procedure  wherein  the  user  executes  each  program  in  turn,  responds 
to  its  prompts,  and  examines  its  output  as  necessary.  A  list  of  the  VDGS 
programs  in  the  order  in  which  they  are  to  be  executed  appears  in  Table 
A1.  A  description  of  this  step-by-step  approach  follows. 

This  approach  has  some  obvious  advantages  over  the  CHAINMIND  ap¬ 
proach.  First  of  all,  the  programs  can  be  run  individually  in  relatively 


NAVTRAEi^XJl  PCEN  7R-C-0141-1 


NOTE: 


TABLE  A1.  VDGS  Programs  in  Order  of  Execution 


1. 

EXTRACT 

2. 

ESG 

3. 

gzec 

4. 

RESCUE 

5. 

SIGH 

6. 

LOOPER 

7. 

REVEXA 

8. 

RVDIT 

9. 

COVERT 

10. 

INVERT 

1 1 . 

CROAK 

12. 

REVEX 

13. 

RVDIT 

14. 

CROAK 

15. 

ADDER 

16. 

AVRAJ 

17. 

CRAP 

18. 

GAPSTER 

19. 

SORTRA 

20. 

SORTRB 

21 . 

GAPSTER 

22. 

MUTE 

23. 

GLOVE 

24. 

TAILOR 

25. 

BUILDER 

26. 

DEALER 

27. 

PHEW 

Three  routines,  RVDIT,  CROAK,  and  GAPSTER,  are  run  at  two  different 
stages . 
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small  blocks  of  computer  time  and  do  not  require  a  20-hour  block  as  does 
MAKEMIND.  Secondly,  error  situations  and  abnormal  conditions  are  much 
more  easily  responded  to  than  in  the  CHAINMIND  approach.  If  an  error 
occurs  in  the  step-by-step  approach,  one  need  only  back  up  a  step  or  two 
and  restart.  Thirdly,  as  will  be  seen  in  the  sequel,  the  step-by-step 
approach  offers  a  degree  of  flexibility  not  available  in  CHAINMIND. 

With  these  observations  in  mind,  let  us  consider  the  VDGS  programs  in 
their  order  of  execution. 

EXTRACT 

The  process  of  voice  data  extraction  begins  with  the  collection  of 
voice  samples  by  the  program  EXTRACT.  For  each  utterance  EXTRACT  creates 
a  compressed  data  file  (-.CD),  and  an  optional  raw  data  file  (-.RD),  all 
the  while  maintaining  a  listing  file  (EXTOUT. LS)  if  desired.  It  is  the 
set  of  compressed  data  files  that  is  used  in  the  remainder  of  the  voice 
data  generation  procedure. 

Probably  the  simplest  way  to  collect  voice  data  samples  is  to  proceed 
as  follows:  Tfcke  a  disk  which  has  been  formatted  and  initialized  and 
which  is  substantially  empty.  Create  a  subdirectory  with  a  three- letter- 
long  name,  and  copy  onto  this  subdirectory  the  set  of  magic  number  sets 
MNSET*  which  are  to  be  used,  as  well  as  the  file  EXTRACT.SV.  This  sub¬ 
directory  will  then  hold  all  the  -.CD  files  and  any  -.RD  and  listing  files 
created  by  EXTRACT.  The  separate  description  of  the  program  EXTRACT  tells 
how  to  proceed  from  here.  But  some  additional  comments  are  in  order. 

a.  The  room  where  the  voice  extraction  is  to  be  done  should,  of 
course,  be  kept  as  quiet  as  possible  to  avoid  excessive  noise  in  the  voice 
signal.  The  volume  adjustment  should  be  set  so  that  the  meter  registers 
about  0.8  when  the  word  "five"  is  spoken.  The  microphone  headset  should 
be  adjusted  so  that  it  is  comfortable  and  so  that  the  microphone  itself  is 
about  4  cm.  from  the  speaker's  mouth. 

b.  The  nunber  of  voice  samples  taken  duving  LCSR  and  VIAS  work 
amounted  to  eighteen  magic  number  sets  worth  of  utterances  -  six  desig¬ 
nated  training  data,  six  designated  interim  test  data,  and  six  designated 
test  data.  It  is  oro,'ably  a  good  idea  to  limit  the  speaker  to  three  magic 
number  sets  at  a  sitting  to  avoid  degradation  in  the  voice  samples  due  to 
speaker  fatigue  or  boredom.  For  the  step>-through  operation  of  the  VDGS 
routines,  it  is  possible  to  use  fewer  than  six  magic  number  sets  apiece 
for  training,  interim  test,  and  test  data;  but  we  still  recommend  that 
altogether  eighteen  magic  number  sets  worth  of  data  be  used. 

c.  Each  magic  number  set  run,  if  desired,  creates  a  listing  file 
EXTOUT. LS  which  holds  formatted  versions  of  all  compressed  data  files  (and 
raw  data  files  if  those  also  are  being  saved).  If  listing  files  are 
desired  for  more  than  one  magic  number  set,  the  file  EXTOUT.  LS  should  be 
renamed  at  the  end  of  each  magic  number  set  run.  This  listing  file  is 
enormously  large  when  both  raw  and  compressed  data  are  saved  (  it  is  on  the 
order  of  1.2  Mbits)  and  printing  it  consumes  a  great  deal  of  time  and 
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In  its  step-by-step  form  ESG  runs  once  for  each  vocabulary  Item.  Hie 
length  of  time  for  each  run  should  be  about  20  minutes. 

Once  ESG  is  run  and  the  output  examined,  it  may  be  the  case  that  ESG 
has  marked  some  words  as  too  lonq  for  further  processing.  Hie  user  lias 
two  options  at  this  points  (1)  continue  and  let  the  program  GZEC  and 
LOOPER  iqnore  the  words  which  are  too  long,  or  (2)  use  the  example  space 
editor  ESDIT  to  modify  the  word  lengths.  For  an  explanation  of  the 
operation  of  ESDIT,  see  the  separate  program  descriptions.  If  the  user 
chooses  to  bypass  ESDIT  and  let  GZEC  and  LOOPER  ignore  some  examples,  then 
the  data  base  used  to  construct  transition  and  loop  letter  sets  will  be 
reduced  to  the  extent  of  the  number  of  ignored  words. 

In  lieu  of  using  ESG  to  generate  example  spaces  automatically,  one 
could  al9o  use  the  program  GWIZ  to  facilitate  hand-marking  the  training 
.lata  and  the  program  MEND  to  create  example  spaces  using  the  data  produced 
by  hand  marking.  For  descriptions  of  these  programs,  see  the  VDGS  auxil¬ 
iary  programs. 

GZEC 


Once  example  spaces  have  been  created,  the  VDGS  is  ready  to  generate 
transition  and  loop  letter  sets.  Since  the  collection  of  transition 
letter  sets  is  the  single  most  critical  item  in  the  VDGS  data  base,  it  is 
mandatory  that  each  transition  letter  set  be  generated  correctly.  The 
program  GZEC ,  embodying  the  critical  algorithm  GENRLIZ,  generates  the 
transition  letter  sets.  For  an  explanation  of  GZEC,  see  the  separate 
program  descriptions.  GZEC  runs  once  for  each  example  space  (and 
consequently  for  eleven  vocabulary  items  should  be  run  eleven  times). 

Each  run  of  GZEC  takes  about.  20  minutes.  Since  *>me  of  the  questions  the 

user  will  be  asked  by  GZEC  are  not  entirely  self-explanatory,  we  make  some 
suggestions  for  responses  below. 

a.  Hie  listing  file  should  be  printer  and  not  disk,  to  conserve  disk 

space 

b.  Do  not  chanqe  the  value  of  SDCOEFF 

c.  Do  not  ii"-  the  cost-weight  factors 

d.  Hiere  is  no  existing  machine  to  be  generalired 

Hie  operation  of  GZEC  does  not  produce  a  single  collection  of  transi¬ 
tion  letter  sets  to  be  used.  Rather,  GZEX!  keeps  a  history  of  the  transi¬ 
tion  letter  sets  Conned  at  each  stage  of  Its  operation.  It  is  up  to  the 

user  -  using  the  routine  RESCUE  -  to  pick  out  the  best  transition  letter 
set  for  each  vocabulary  item. 

Since  It  may  happen  that  a  particul arly  bad  exmple  of  an  utterance 
•un  tn  the  example  space,  or  that  a  had  "cuttinq"  of  a  vocabulary  Item 
«i*Mn  an  utterance  has  occurred,  any  collection  of  transition  letter  sets 
-»  •  e  resurrected  as  long  as  the  temporary  files  TRLi>**.W  and  TRIX**.TM 
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still  exist.  This  "resurrection"  Is  done  using  the  program  RESCUE  which 
asks  for  the  "RESCUE  INDEX.”  The  "RESCUE  INDEX"  corresponds  to  that 
number  on  the  cost  graph  produced  by  GZEC  indicating  the  desired  set  of 
transition  letter  sets. 

The  correct  RESCUE  index  for  each  vocabulary  item  is  determined  by 
looking  at  the  GZEC  printout.  Normally  the  last  transition  letter  set 
formed  by  GZEC  is  the  right  one.  In  this  case,  look  at  the  cost  graph  at 
the  end  of  the  particular  GZEC  run,  and  determine  the  last  "machine 
number"  corresponding  to  the  column  of  utterance  names  on  the  left  hand 
side. 

If  it  should  happen  that  the  last  transition  letter  set  formed  by 
GZEC  differs  significantly  from  the  next-to-last  or  next-to-next-to-last , 
then  an  earlier  machine  number  should  be  chosen.  In  this  case  "differ 
signi f icantly"  means  that  the  final  transition  letter  set  was  formed  by 
dropping  three  or  more  transition  letters  from  the  preceding  transition 
letter  set.  In  a  rare  case  the  last  transition  letter  set  will  be  "signi¬ 
ficantly  different".  Still  rarer,  the  last  two  transition  letter  seta 
will  be  significantly  different.  Once  these  machine  numbers  have  been 
chosen  by  the  user  for  each  vocabulary  item,  we're  ready  to  run  RESCUE, 
pluck  out  the  transition  letter  sets  corresponding  to  those  machine 
numbers,  and  set  up  the  transition  letter  set  files  to  be  used  for  the 
rest  of  VDGS  processing. 

RESCUE 

To  run  RESCUE,  follow  the  instructions  in  the  separate  program 
description.  In  RESCUE,  the  term  "rescue  index”  means  the  same  thing  that 
"machine  number"  <loes  in  GZEC  printout.  RESCUE  must  be  run  once  for  each 
vocabulary  item.  Its  execution  time  in  minimal. 

We  recommend  that  the  user  not  delete  the  files  TRLS**.TM  and 
TRIX*#.TM  when  given  this  option  by  RESCUE.  Should  the  transition  let¬ 
ter  sets  created  by  RESCUE  be  accidentally  deleted  or  become  inaccessible, 
they  can  be  re-created  if  the  temporary  files  TRLS**.TM  and  TRIX**.TM 
still  exist.  Otherwise,  it  would  be  necessary  to  run  GZEC  all  over  aqaln. 

SIGH 

The  program  SIGH  is  run  next.  It  checks  the  transition  letter  sets 
one-by-one  to  see  if  their  length  exceeds  13.  If  so,  the  length  is 
reduced  by  omitting  the  letters  with  most  "7"  featues.  If  not,  SIGH  sim¬ 
ply  passes  on  to  the  next  item. 

LOO  PER 

Once  transition  letter  sets  are  created,  rescued,  and  checked,  the 
loop  letter  sets  can  be  found  by  the  program  LOOPER.  LOOPER  must  be  run 
once  for  each  example  space  and  so  must  be  run  once  per  vocabulary  item  - 
eleven  times  for  eleven  items.  The  run  time  for  a  single  LOOPER  run  in 
this  environment  is  about  30  minutes.  A  complete  description  of  the  oper¬ 
ation  of  LOOPER  is  included  in  the  program  descriptions. 
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REVEXA 

The  real  statistical  data  collection  process  begins  with  REVEXA  after 
the  transition  and  loop  letter  sets  have  been  created.  The  purpose  of 
REVEXA  is  to  collect  counter  data  statistics,  i.e.,  statistics  of  time  of 
residence  of  an  incoming  letter  from  an  utterance  in  a  transition  or  loop 
letter  set.  To  this  end,  REVEXA  must  be  run  over  all  training  data  -  that 
is,  over  all  magic  number  sets  used  to  generate  the  training  data.  The 
explanation  of  how  to  run  REVEXA  is  included  in  the  program  descriptions. 
Recommended  responses  to  some  of  the  prompts  are  given  below: 

a.  The  mode  of  data  acquisition  should  be  3.  The  magic  number  sets 
to  be  used  here  should  be  the  ones  designated  for  training  data. 

b.  All  optional  printing  should  be  done.  A  great  deal  of  informa¬ 
tion  about  REVEXA  and  the  recognition  process  in  general  is  contained  in 
these  printouts. 

c.  When,  on  the  second  and  succeeding  runs  of  REVEXA,  the  user  is 
asked  if  the  CDAT. RV  and  CIDX. RV  files  are  to  be  deleted,  the  answer 
should  be  "no".  In  this  case  the  program  will  continue  to  append  to  the 
old  files,  and  this  is  what  is  needed. 

d.  Later  on  we  will  explain  why  the  user  might  choose  to  run  one  or 
two  initial  machines.  If  the  user  is  doing  so,  he  must  enter  the  vocabu¬ 
lary  item  nianber  for  each  initial  machine  and  also  a  stop  time  (in  TTI-500 
time  count  units)  for  that  initial  machine. 

e.  For  REVEXA,  the  user  should  request  that  only  the  machines  in  the 
utterance  should  be  used. 

A  large  file  of  counter  data  statistics  is  created  by  running  REVEXA 
over  the  six  magic  number  sets  constituting  training  data.  For  some  of 
the  utterances  processed  by  REVEXA,  the  subroutine  MINIMINT  (which  mimics 
the  operation  of  the  MINT  part  of  LISTEN)  cannot  come  to  a  conclusion.  In 
that  case  the  user  has  two  options:  (1)  ignore  the  misses  and  run  RVDIT 
with  no  counter  data  record  modifications,  or  (2)  list  the  record  numbers 
of  all  items  occurring  in  a  MINIMINT  failure,  and  create  a  file  RVCARDS  in 
the  format  described  in  A. 9.,  with  record  nixnber  entries  for  all  the 
records  to  Lf.  ''lagged  as  "real."  If  the  misses  are  simply  ignored,  the 
number  of  artifacts  generated  in  subsequent  routines  will  be  somewhat 
larger:  and  the  distinction,  between  real  recognitions  and  artifacts, 
blurs  in  proportion  to  the  number  of  items  ignored. 

The  run  time  of  REVEXA  is  about  thirty  minutes  per  magic  number  set. 


RVDIT 


The  program  RVDIT  creates  individual  counter  data  records  for  each 
vocabulary  item  and,  if  desired,  deletes  from  consideration  all  records 
specified  in  the  file  RVCARDS.  The  use  of  RVDIT  is  further  explained  in 
the  program  descriptions.  Run  time  for  RVDIT  is  about  twenty  minutes. 
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COVERT 

The  routine  COVERT  is  run  next.  Its  principal  function  is  to  create 
the  covariance  matrix  for  each  machine.  The  details  of  its  operation  and 
use  are  explained  further  in  the  program  descriptions. 

INVERT 

The  routine  INVERT  is  the  next  step.  It  inverts  the  covariance 
matrices  created  by  COVERT.  Its  use  is  also  explained  later.  In  running 
this  routine  the  user  has  the  option  of  computing  and  printing  the  eigen¬ 
values  of  the  covariance  matrices.  The  actual  eigenvalues  are  not  used 
later  in  the  processing,  so  they  may  or  may  not  be  computed  at  the  user's 
discretion. 

CROAK 


The  last  routine  which  operates  using  training  data  i3  CROAK.  The 
program  CROAK  is  an  eclectic  routine  which  performs  all  manner  of  statis¬ 
tical  computations  and  prints  plots  of  6  and  p  distributions.  A  discus¬ 
sion  of  the  operator's  interaction  with  CROAK  appears  in  the  program 
descriptions  of  this  appendix.  CROAK  must  be  run  twice  over  each  vocabu¬ 
lary  item  -  once  to  generate  statistics  about  real  recognitions,  and  once 
to  generate  statistics  about  artifacts.  So,  for  the  first  CROAK  run,  the 
user  should: 

a.  Answer  "2"  to  the  question  about  modes 

b.  Save  the  probability  statistics 

c.  Set  starting  machine  number  »  00  and  end  machine  number  equal  to 
the  last  machine  used  (10,  11,  or  12) 

Then  CROAK  runs  about  thirty  minutes.  For  the  second  CROAK  run,  the  user 
should: 

a.  Answer  "4"  to  the  question  about  modes 

b.  Enter  starting  and  end  machine  numbers  as  before 
Then  CROAK  runs  again  for  about  thirty  minutes. 

SOME  CLEANUP 

At  this  point  the  user  should  do  some  disk  cleanup.  He  can  and 
should  delete  the  following  files:  CDAT-.RV,  CIDX.RV,  QDAT-.RR,  QDAT-.AF, 
MUDT-.RR,  MUDT-.AF,  RVX.ST,  and  RVCARDS . 

REVEX 


Now  we  begin  to  use  the  interim  test  data.  The  general  description 
of  the  use  and  functions  of  the  program  REVEX  is  contained  in  the  prgram 
descriptions,  but  there  are  a  few  suggestions  we  should  make. 
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a.  The  mode  of  data  acquisition  should  be  3.  The  magic  number  sets 
to  be  used  here  should  be  the  ones  designated  for  interim  test  data. 

b.  All  optional  printing  should  be  done. 

c.  When,  on  the  second  and  succeeding  runs  of  REVEX,  the  user  is 
asked  if  the  CDAT.RV  and  CIDX.RV  files  are  to  be  deleted,  the  answer 
should  be  "no".  Then  REVEX  will  continue  and  append  to  the  old  files. 

d.  If  the  user  is  running  one  or  two  initial  machines,  he  must.,  at 
the  appropriate  prompt,  enter  the  vocabulary  item  number  for  each  initial 
machine  and  also  a  stop  time  (in  TTI-500  counts)  for  each  initial  machine. 

e.  The  user  should  request  that  all  machines,  not  just  the  ones  in 
the  utterance,  be  run.  This  is  important  because  this  is  the  point  at 
which  data  about  artifacts  is  gathered. 

With  REVEX,  as  with  REVEXA,  a  large  file  of  counter  data  statistics 
is  created  by  running  the  program  over  the  six  magic  number  sets  of 
interim  test  data.  Also,  some  of  the  utterances  will  not  be  recognized 
correctly  by  the  MINIMINT  subportion  of  REVEX.  In  these  cases  the  user 
once  again  has  the  choice  of  ignoring  the  MINIMINT  failures  and  proceed¬ 
ing,  or  of  creating  the  file  RVCARDS  of  records  to  be  flagged  as  "real." 

If  this  option  is  chosen,  all  record  numbers  corresponding  to  real  recog¬ 
nitions  should  be  entered  in  RVCARDS. 

RVDIT 

Run  RVDIT  just  as  before  on  REVEX  output  files  CDAT.RV  and  CIDX.RV. 

CROAK 

Pun  CROAK  just  a3  before. 

ADDER 

The  next  VDGS  routine  to  be  run  is  ADDER.  This  program  builds  a 
table  of  transition  and  loop  letter  set  violations  for  each  utterance 
processed  by  REVEX.  A  complete  description  of  ADDER  is  given  in  the  pro¬ 
gram  descriptions,  and  the  only  additional  suggestion  to  be  made  is  that 
the  output  should  not  be  directed  to  disk  since  disk  space  is  probably 
sparse  at  this  point. 

AVRAJ 


The  program  AVRAJ  is  the  next  step  in  the  process.  AVRAJ  computes 
and  prints  the  average  word  length  for  all  vocabulary  items.  The  routine 
should  be  run  as  described  below. 


NAVTRAEQIJIPCEN  7H-C-0 141-1 


CRAP 


The  critical  association  parameters  are  determined  by  CRAP.  CRAP 
should  be  run  as  indicated  in  the  program  descriptions.  As  usual,  the 
listing  file  should  be  directed  to  the  printer  and  not  to  the  disk. 

GAPSTER 

The  program  GAPSTER  is  primarily  responsible  for  creating  the  gap 
matrix  and  the  QASM  matrix  needed  in  the  MIND  file.  The  operation  of 
GAPSTER  is  described  in  the  sequel  in  the  program  descriptions,  but  a  few 
comments  about  user  inputs  to  GAPSTER  are  suggested  below. 

a.  The  critical  association  parameter  entered  should  be  1.0. 

b.  The  real  standard  deviation  spread  factor  for  the  gap  matrix  also 
should  be  entered  as  1.0. 

c.  The  quartile  and  mean  calculations  are  optional  and  not  used  in 
later  processing. 

d.  Disk  file  output  should  not  be  chosen. 

SORTRA 

Run  SORTRA  to  sort  the  file  GAPMAX. 

SORTRB 

Run  SORTRS  to  sort  the  file  CONGA P. 

GAPSTER 

Delete  ’the  files  QASM. DT,  GAP.DT,  and  GAPMAX,  and  re-run  GAPSTER  as 
before. 

MUTE 


Run  MUTE  to  compute  the  L-counter  parameters  MDLA**  for  each  machine. 


GLOVE 


Run  GLOVE  to  do  the  curve  fitting  for  CROAK-generatad  6-distributions 
TAILOR 

Run  TAILOR  to  compute  the  T-counter  parameters  MDTA**  for  each 
machine. 

BUILDER 

Run  BUILDER  to  create  the  machine  data  fiLe. 
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DEALER 

. 

The  program  DEALER  creates  the  MIND  file.  For  the  most  part,  its 
operation  is  described  in  the  program  descriptions  of  this  appendix. 

However,  there  are  a  couple  of  required  operator  inputs  that  are  not  com¬ 
pletely  self-explanatory.  In  the  first  place,  the  "revision  number  for 
this  job"  should  be  entered  as  "*0"  for  all  speakers.  Secondly,  when 
there  are  one  or  more  universal  machines  being  run,  the  program  DEALER 
will  ask  for  vocabulary  identification  number  and  end  time  for  each 
machine,  and  these  must  be  supplied  by  the  user.  For  example,  if  machine 
11  is  a  universal  machine  for  vocabulary  item  2,  an  appropriate  response 
might  be  "2,25"  when  vocabulary  item  identification  and  end  time  are  asked 
for  . 

PHEW 

Then  the  process  of  the  VDGS  is  completed  by  PHEW  which  finishes  the 
building  of  the  MIND  file.  The  only  response  required  of  the  operator 
here  is  the  entry  of  the  total  number  of  vocabulary  items  used. 

A. 5  THE  AUXILIARY  PROGRAMS 

. 

The  auxiliary  programs  delivered  with  the  VDGS  are:  GASP,  ESDIT, 

ESGDIT,  MEND,  and  GWIZ.  Here  we  describe  the  function  of  these  auxiliary 
routines  and  indicate  how  they  add  to  the  flexibility  of  the  VDGS. 

■frie  program  GASP  has  the  simple  function  of  printing  the  transition 
letter  sets  after  the  program  RESCUE  has  been  run,  or  printing  the  merged 
transition  and  loop  letter  sets  after  the  program  LOOPER  has  been  run. 

Using  GASP  the  user  can  see,  and  group  together  on  hardcopy  for  future 
reference,  the  transition  letter  sets  for  each  vocabulary  item  (with 
accompanying  loop  letter  sets,  if  desired). 

The  programs  ESDIT  and  ESGDIT  are  both  concerned  with  the  editing  of 
example  spaces.  ESDIT  allows  the  user  to  change  individual  start  or  stop 
times  in  the  example  space  using  either  an  ESG  or  a  GZEC  print.  The 
reason  that  ESDIT  is  sometimes  used  is  that  the  individual  start/stop 
times  in  the  example  spaces  are  sometimes  bad  -  either  the  time  duration 
for  the  wor^  <  «■  tjo  long,  or  the  word  has  been  "cut  out"  from  the  utter¬ 
ance  in  a  less  than  satisfactory  way.  This  "bad  cutting"  can  arise  from 
either  an  anomaly  in  the  automatic  example  space  generator  ESG,  or  a  human 
error  if  manual  hand-marking  is  done  using  GWIZ  and  MEND. 

In  any  case,  if  the  user  wishes  to  modify  the  individual  start/stop 
times  in  an  example  space,  just  what  numbers  are  entered  depends  upon  what 
program's  output  is  being  used.  If  an  ESG  printout  is  being  used,  simply 
enter  the  new  start  and  stop  times  when  the  program  requests  them.  If  a 
GZEC  printout  is  used,  the  situation  is  a  little  more  complicated.  If  the 
old  beginning  stop  time  is  T^  and  the  user  wishes  to  change  this  to 
Tb,  enter  T^  -  Tb  +  1  as  "new  starting  record".  So,  if  the  begin¬ 
ning  record  nunber  is  correct  as  is,  enter  1.  To  change  the  end  time  from 
Te  to  T@,  enter  Te  -  (Tb  +  total  number  of  records  in  word)  +  1  as 
"new  ending  record". 
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The  program  ESGDIT  is  designed  to  operate  on  an  existing  example 
space  file  to  produce  a  new  example  space  file  in  which  all  utterances 
beginning  with  the  vocabulary  item  specified  are  omitted.  The  operator  of 
ESGDIT  is  explained  fully  in  the  program  descriptions.  Just  why  ESGDIT  is 
used  is  explained  below  where  universal  and  initial  machines  are 
discussed. 

The  routines  GWIZ  and  MEND  are  programs  to  be  employed  when  manual 
"hand-cutting"  of  utterances  into  individual  words  is  to  be  used  as  a  step 
in  the  creation  of  example  spaces.  The  technique  of  hand-cutting  is 
described  later.  The  general  procedure  for  the  semi-manual  creation  of 
example  spaces  is  as  follows: 

a.  Create  a  file  GWIZ.CD  which  holds  the  names  of  all  compressed 
data  files  corresponding  to  training  data.  This  file  should  hold  one  file 
name  per  line,  left  justified.  A  relatively  painless  way  of  constructing 
this  file  is  to  use  the  BUILD  command  at  the  CLI  level  as  was  previously 
described  for  the  program  ESG. 

b.  Run  GVJIZ,  following  the  instruction  given  in  the  separate  program 
description. 

c.  Hand  cut  the  GWIZ  printouts,  noting  start  and  end  times  of  each 
word  within  each  utterance. 

d.  Create  a  file  MEND.WP  holding  all  this  data  from  hand-cutting 
(the  format  of  this  file  is  described  in  the  description  of  MEND). 

e.  Run  MEND  to  create  the  example  spaces. 

A. 6  HAND-CUTTING  THE  DATA 

The  process  of  hand-cutting  data  to  separate  words  within  an  utter¬ 
ance  is  as  much  of  an  art  as  a  science  and  is  best  learned  by  doing. 
However,  there  are  some  rules  of  thumb  and  general  guidelines  that  the 
speech  technician  might  wish  to  consider. 

a.  The  program  GWIZ,  itself,  indicates  locations  of  words  with 
utterances  on  its  printout.  These  are  quite  helpful  but  generally  are  not 
refined  enough  to  be  used  as  more  than  guidelines. 

b.  Whoever  handcuts  data  must  become  extremely  familiar  with  the 
hard  copy  presentation  of  an  utterance  and  the  variety  of  patterns  associ¬ 
ated  with  each  particular  item.  It  is  a  good  idea  to  start  with  utter¬ 
ances  consisting  of  a  single  word  and  to  compare  those  with  utterances 
where  that  word  is  only  a  part.  The  important  thing  is  to  be  able  to  dis¬ 
tinguish  words  visually  and  locate  the  interword  boundaries.  Because  of 
co-articulation  effects,  it  is  important  to  allow  overlap  of  word  boun¬ 
daries;  rarely,  in  the  context  of  hand-cutting,  should  the  utterances  be 
divided  into  non-overlapping  segments.  The  idea  here  is  that  the  transi¬ 
tion  letter  set  maker,  GZEC,  will  discern  the  important  structure  within 
the  utterance,  and  that  one  should  not  attempt  to  make  too  fine  or  too 
subtle  distinctions  during  the  hand-marking  process. 
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c.  The  geometrical  configurations  of  the  vocabulary  items  are  a 
significant  aid  in  marking  the  data.  The  "shapes"  of  words  should  be 
learned  well  before  much  hand-marking  is  done. 

d.  Sibilants  and  fricatives  are  a  big  help  in  marking  the  data.  The 
"s",  "x",  and  "th"  sounds,  when  identified,  make  the  demarking  procedure 
much  easier. 

e.  The  relative  letter  counts  indicated  as  the  GWIZ  printout  help 
pick  out  longish  sounds,  e.g.,  "...ee"  in  "three",  ""...oo",  in  "two", 
etc. 

f.  Features  16-18  in  GWIZ  printout  are  generally  set  for  fricatives, 
e.g.,  "s"  in  "six";  features  15-18  are  often  set  in  the  "x"  of  "six". 

g.  Relatively  long  stops  occur  preceding  "two",  "point",  and 
"three",  when  these  items  occur  in  the  middle  of  ^n  utterance. 

h.  The  vocabulary  item  "eight"  is  short  and  hard  to  pick  out.  For 
thi3  one,  attention  is  best  paid  to  relative  time  counts  of  two  or  three 
different  basic  sound  groups. 

i.  Fluid  vowel  sounds  are  sometimes  very  hard  to  distinguish,  and 
often  vary  considerably  from  sample  to  sample. 

A. 7  ON  UNIVERSAL  AND  NON-INITIAL  MACHINES 

For  some  speakers  a  given  vocabulary  item  can  apparently  vary  signi¬ 
ficantly,  depending  on  whether  it  is  the  initial  word  in  an  utterance.  If 
the  difference  between  initial  and  non-initial  voicings  of  a  word  are 
significant  enough,  then  a  recognition  process  which  does  not  distinguish 
initial  from  non-initial  will  not  work  very  well.  The  VDGS  has  some 
facility  for  dealing  with  this  problem,  at  least  to  a  limited  extent. 

When  REVEXA  is  run,  one  has,  for  the  first  time,  some  indication  of 
how  well  the  transition  letter  sets  are  performing  for  the  standard  eleven 
vocabulary  items.  If  the  recognition  of  initial  digits  is  noticeably  bad 
for  one  or  two  items,  the  speech  technician  has  the  option  of  creating 
"universal"  and  "-'^n-initial"  machines  for  these  items,  with  separate 
transition  and  loop  letter  sets.  Concretely,  this  means  that  the 
following  steps  must  be  carried  out  (to  be  specific  here,  we  assume  that 
the  vocabulary  items  "two"  and  "three"  for  Ulysses  S.  Grant  require  both 
non- initial  and  universal  machines): 

a.  Rename  ES$USG$02  as  ES$USG11 

b.  Rename  ESSUSG50 3  as  ESSUSG1 2 

c.  Run  ESGDIT,  entering  "2"  as  vocabulary  item,  example  space  name 
ES$USG$ 11,  and  new  example  space  name  as  ES$USG$02. 


d.  Run  ESGDIT,  entering  "3"  as  vocabulary  item,  example  space  name 
ES$USG$12,  and  new  example  space  name  as  ES$USG$03. 
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e.  Rename  MC02.TL  as  MCI  1.TL 

f.  Rename  MC03. TL  as  MC12.TL 

g.  Rename  MC02.LP  as  MCI  1.LP 

h.  Rename  MC03.LP  as  MC12.LP 

i.  Delete  TRLS02. IN,  TRLS03.  TM,  TRIX02.TM,  and  TRIXOl.'m 

j.  Run  GZEC  for  the  new  example  spaces  ES$USG$11  and  ES$USG$12 

k.  Run  IOOPER  for  these  new  example  spaces 

l.  Rerun  REVEXA. 

A. 8  PROGRAM  DESCRIPTIONS  FOR  VDGS  ROUTINES 

Program  descriptions  for  VDGS  routines  follow; 
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1 .  EXTRACT 


Title:  EXTRACT. SV 

Purpose : 

The  purpose  of  EXTRACT  is  fourfold: 

a.  Prompt  the  speaker  to  voice  an  utterance. 

b.  Save  the  TTI-500  features  generated  by  that  utterance  on  a  disk 
file. 

c.  Compress  the  features  for  LCSR  processing. 

d.  Provide  hardcopy  printouts  of  both  raw  feature  data  and  com¬ 
pressed  data. 


Printout: 

The  TTI-500  detects  32  features  every  2  msec.  One  of  these  features 
( LP^ )  signals  a  long  pause:  the  speech  sample  is  complete.  The 
software  backs-up  50  TTI-500  samples  (a  set  of  32  features),  and 
continues  going  back  through  the  samples.  When  feature  26  (UVNLC), 

28  (n-j  +  nj )  or  2b  (EG^  +  EG2 )  is  found,  the  search 
terminates.  This  collection  of  features  we  call  the  "raw-data." 

The  data  extraction  program,  at  the  user's  option,  will  print  this 
raw  data  in  the  standard  space/asterisk  format.  The  printout  is  con¬ 
sistent  with  TTI  conventions:  feature  1  (MAXD1)  is  at  the  left, 
feature  32  (LP4)  on  the  right. 

A  letter  is  the  subset  of  features  17-31.  Associated  with  each 
letter  is  a  count  of  the  number  of  times  that  letter  occurred  in  the 
raw  data,  interrupted  by  not  more  than  one  occurrence  of  any  other 
letters.  (Such  single  count  letters  are  always  ignored.)  This  col¬ 
lection  of  letters  and  counts  we  call  the  "compressed  data." 

The  data  extraction  program  reduces  the  raw  data  and  prints  the 
resulting  compressed  data. 

User  Dialog: 

EXTRACT 

(Ensure  that  any  subdirectories  to  be  used  are  initialized.  It  is 
recommended  that  each  user  utilize  a  personal  subdirectory  so  that 
multiple  copies  of  data  files  for  the  same  digit  string  can  be  kept. 
An  identifier  and  instructions  appear:) 

DATA  EXTRACTION  PROGRAM — ECLIPSE  RDOS  REV  6.23 


STRIKE  CNTRL-A  TO  EXIT  FROM  PROGRAM 
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(The  user  is  requested  to  enter  his  name  and  an  identifying  comment 
of  up  to  80  characters  for  this  run.  This  information,  together  with 
the  date,  is  printed  as  a  header  on  all  printouts  of  the  program. 

Next,  the  user  is  queried  to  determine  if  he  wishes  to  use  a  pre¬ 
defined  prompting  file.  If  so,  he  enters  the  file  name.  The  program 
will  verify  that  the  name  and  file  exist,  but  no  other  special  checks 
are  made. 

Following  queries  to  determine  if  the  user  wishes  the  printout  of  raw 
data  and/or  compressed  data,  the  program  prepares  to  accept  speech 
data.  If  no  prompting  file  is  named,  the  user  is  commanded:) 

SPEAK  I  1 

(The  TTI-500  is  activated  and  the  program  "listens"  until  the  LP4 
feature  is  detected.  The  TTI-500  is  then  de-activated  and  the  user 
is  requested  to) 

ENTER  COMMENT  LINE: 

(Up  to  forty  characters  may  be  entered,  then) 

ENTER  RAW  DATA  FILE  NAME: 

ENTER  COMPRESSED  DATA  FILE  NAME: 

(If  a  file  name  which  is  already  used  is  entered,  the  user  is  told  of 
the  condition  and  requested  to  redefine  the  file.  A  more  serious 
error  (e.g.,  directory  not  initialized)  will  cause  an  abnormal  return 
to  CLI. 

The  following  convention  is  recommended  for  use  in  naming  the  raw 
data  and  compressed  data  files:  five  characters,  dot,  "RD"  or  "CD". 
Of  the  five  characters,  the  first  is  the  number  set  identifier  ( A-K ) 
or  "X"  if  no  number  set  is  used.  The  following  four  characters 
represent  the  number  spoken,  with  N  meaning  "null,"  "P*  meaning 
point.  The  .RD  and  .CD  extensions  refer  to  "Raw  Data"  and  "Com¬ 
pressed  Data." 

Once  the  files  are  named ,  the  program  proceeds  to  print  the  com¬ 
pressed  data  and  raw  data  (if  the  user  requested  it)  and  to  write  the 
data  onto  the  disk  files.  The  program  then  goes  back  to  listening 
signaled  by  the  SPEAKI!  command.  Note  that  unless  spooling  is  dis¬ 
abled,  the  line  printer  may  still  be  active  (very  noisyl )  when 
SPEAKI 1  is  offered.  The  user  should  be  careful  not  to  turn  on  the 
microphone  until  after  the  printing  is  complete. 

If  the  user  had  named  a  prompting  file,  the  request  to  SPEAKI!  will 
be  replaced  by) 
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SAY:  number 

(Where  the  number  is  retrieved  from  the  prompting  file.  When  the 
TTI-500  "hears"  something,  assumed  to  be  the  prompted  string,  the 
notification) 

OK 

(is  given.  The  file  names  are  automatically  retrieved  from  the 
prompting  file  (the  convention  noted  above  is  used)  and  the  printout 
and  disk  writing  is  performed.  If  the  files  already  exist,  the  user 
is  requested  to  intervene  and  name  files  on-line  for  the  data.  The 
user  should  note  these  problems  and  resolve  them  following  the  data 
extraction  session.  When  the  prompting  file  is  exhausted,  the 
warning) 

NO  MORE  PROMPTS  IN  file  name 

(is  given  and  the  user  is  informed  that  the  program  will) 

GO  BACK  TO  START! 

Input  File: 

MNSET-,  the  number  set  file 
Output  Files: 

-.CD,  the  compressed  data  files. 

-.RD,  the  raw  data  files. 

Error  Messages: 

Only  the  standard  RDOS  file  error  messages  are  applicable  to  EXTRACT. 
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2.  ESG 

Title:  ESG. SV 

Purpose : 

ESG  builds  an  example  space  for  a  specified  vocabulary  item  from 
specified  compressed  data  files.  Inputs  to  ESG  include  a  list  of  the 
compressed  data  files  which  contain  this  item,  and  the  canonical  set 
of  length  and  stretch  factors.  ESG  determines  what  portion  of  the 
utterance  is  most  likely  to  contain  that  item,  and  writes  the  file 
name  and  starting  and  ending  records  to  the  example  space  file. 

Printout : 

ESG  provides  a  printout  which  describes  the  example  space  file 
entries.  A  printout  of  the  automatically  selected  portion  of  the 
utterance  is  also  provided  under  certain  conditions  when  ESG  deter¬ 
mines  that  hand  marking  of  the  data  may  be  required.  This  occurs 
whenever  the  selected  portion  of  the  utterance  is  too  long  for 
GENRLIZ  to  accommodate,  and  when  a  doublet  occurs.  If  modifications 
are  required,  the  example  space  file  can  be  edited  usinq  ESDIT . 

User  Dialog: 

ESG 

ENTER  THE  EXAMPLE  SPACE  FILE  NAME 
(THE  FIRST  2  CHARACTERS  MUST  BE  'ES'): 

(This  is  the  designated  file  name  of  the  example  space  file  which  is 
to  be  generated.) 

FILE  ALREADY  EXISTS. 

MAY  I  DELETE  IT  (Y/N)7 

(If  the  example  space  file  already  exists,  the  user  can  choose  to 
continue  by  deleting  the  existing  file  or  terminate. 

If  he  chose  to  terminate,  the  CRT  displays) 

STOP-  EXAMPLE  SPACE  FILE  ALREADY  EXISTS. 

(Otherwise  the  dialog  continues) 

ENTER  A  BRIEF  DESCRIPTION  OF  THE  FILE: 

ENTER  THE  2-DIGIT  VOCABULARY  ITEM  #  (00-10): 

ENTER  THE  PROMPTING  FILE  NAME: 

(The  prompting  file  contains  the  file  names  of  the  compressed  data 
file  names  used  in  generating  the  example  9pace  file.) 
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ESTER  THE  3 -LETTER  SUBDIRECTORY  NAME: 

COMPLETED  PROCESSING  VOCABULARY  ITEM  #: 

USING  PROMPTING  FILE:  AND  SUBDIRECTORY 

DO  YOU  WISH  TO  CONTINUE  PROCESSING  ON  THIS  VOCABULARY  ITEM  (Y/N)? 

(If  the  user  wishes  to  continue  processing,  the  program  requests 
another  input  of  a  prompting  file  name  and  a  subdirectory  name.  The 
program  continues  building  the  example  space  file  on  the  same  vocabu¬ 
lary  item  using  the  newly  specified  prompting  file  and  subdirectory. 
Otherwise  the  program  terminates.) 

STOP 

Input  Files: 

WIZ.ST,  the  file  of  length  and  stretch  factors 

Prompting  file  (a  user-supplied  filename  -  e.g.  PFILE)  with  the  file 
names  of  the  compressed  data  files  to  be  used. 

Specified  compressed  data  files 

Output  Files: 

Example  space  file  for  the  specified  vocabulary  item. 

Error  Messages: 

INVALID  VOCABULARY  ITEM  #  ENTRY 
(Another  input  is  requested.) 

STOP-FILE  WIZ.ST  DOES  NOT  EXIST 

(The  program  terminates  without  this  input  file.) 

CKST — FILE  DOES  NOT  EXIST: 

(If  the  prompting  file  does  not  exist,  the  program  terminates.  If  a 
specified  crv^ressed  data  file  does  not  exist,  the  program  continues 
with  the  next  specified  compressed  data  file.) 


CKST - UNKNOWN  ERROR: 


FILE: 
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3.  GZEC 

Title:  GZEC.SV 

Purpose : 

GZEC  finds  the  set  of  transition  letter  sets  for  a  specified  vocab¬ 
ulary  item,  using  GENRLIZ. 

Printout : 

GZEC  provides  three  types  of  printout.  The  first  describes  the 
development  of  the  set  of  transition  latter  sets.  In  the  compressed 
data  printout,  the  header  shows  the  feature  number.  Below  this,  in 
the  "FREQ"  column,  the  letter  itself,  delimited  by  the  "j"  symbol,  is 
printed.  The  features  set  in  a  letter  are  shown  by  the  symbol 
blanks  indicate  the  feature  was  not  set.  In  the  transition  letter 
set  printout,  the  means  the  feature  must  be  set,  a  blank  means 

the  feature  must  not  be  set,  "?"  indicates  indifference,  and  "N" 
shows  where  a  modification  to  accommodate  this  utterance  occurred. 

The  mapping  of  the  transition  letter  sets  into  the  utterance  is  shown 
by  printing  the  particular  set  next  to  the  first  letter  in  the  utter¬ 
ance  which  is  contained  in  that  set  which  occurs  after  a  letter  in 
the  previous  set.  The  "NUMBER  OF  TRANSITION  LETTER  SET"  column  shows 
the  relation  of  the  current  sets  to  the  initial  or  seed  set  of  trans¬ 
ition  letter  sets. 

The  second  printout  shows  the  cost  to  modify  the  transition  letter 
sets,  the  current  mean  cost  and  standard  deviation  for  each  example 
encountered.  The  plot  of  these  values  is  useful  for  detecting  bad 
examples.  The  rescue  index  is  used  to  retrieve  any  particular  set  of 
transition  letter  sets  for  further  use. 

The  third  printout  shows  the  cost  to  modify  particular  sets  of  trans¬ 
ition  letter  sets  ( "MACHINE  NUMBER”  on  the  printout)  to  accommodate  a 
particular  example.  A  cost  of  0.0  indicates  that  no  modification  was 
required.  A  "*"  marks  the  birth  of  a  new  machine,  that  is,  it  shows 
that  previous  machine  was  modified  to  accommodate  the  example. 

User  Dialog: 

GZEC 

ENTER  NAME  OF  EXAMPLE  SPACE  FILE: 

ENTER  NAME  OF  SUBDIRECTORY  WHERE 
TEMPORARY  FILES  ARE  TO  RESIDE: 

ENTER  TWO  DIGIT  VOCABULARY  ITFM  NUMBER: 

(If  temporary  files  already  exist,  the  system  queries) 
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FILES  ALREADY  EXIST:  TR** — .TM 
MAY  I  DELETE  THEM?  (Y  OR  N): 

<A  "N"  response  causes  system  to  STOP.  Rename  .TM  files  before 
resuming  processing.) 

ENTER  LISTING  FILE 
(P  -  >  $LPT,  D  -  >  DISK): 

(If  the  "D"  option  is  selected,  the  listing  file  name  is  constructed 
from  the  example  space  file  name  with  the  . LS  extension.  If  this 
listing  file  already  exists,  the  system  queries) 

MAY  I  DELETE  filename?  (Y  OR  N): 

(A  "N"  response  causes  the  system  to  STOP.  Rename  .LS  file  before 
resuming  processing) 

PRESENT  VALUE  OF  SDCOEF  IS  XX. X 
DO  YOU  WANT  TO  CHANGE  SDCOEF?  (Y  OR  N): 

(This  is  the  value  which  controls  modification  of  the  transition 
letter  sets.  The  modification  is  allowed  if  the  cost  is  <  the  mean 
cost  ♦  SDCOEF  standard  deviations.  If  "Y"  is  entered,  the  system 
responds.) 

ENTER  SDCOEF: 

DO  YOU  WANT  TO  USE  THE  COST  WEIGHT  FACTORS?  (Y  OR  N) 

(IF  NOT,  ALL  WEIGHTS  ARE  1.0.  ENTER  'N*  FOR  HAND  MARKED  DATA): 

(Weighting  factors  are  used  to  reduce  the  contribution  of  extraneous 
end  data  to  the  final  set  of  transition  letter  sets. 

IS  THERE  AN  EXISTING  MACHINE 
WHICH  IS  TO  BE  GENERALIZED?  (Y  OR  N): 

(This  option  allows  an  existing  machine  to  be  generalized  to  accommo¬ 
date  new  examples.  A  MY"  response  causes  the  system  to  prompt:) 

ENTER  FILE  NAME  (SUBDIR: NAME) : 

(If  the  "N"  response  was  given  to  the  former  question,  the  system 
begins  the  search  for  a  good  starting  point  in  the  data.  If  the 
value  of  SDCOEF  is  sufficiently  small,  etc.,  the  first  pass  through 
the  example  space  may  not  yield  a  starting  set  of  transition  letter 
sets  which  satisfy  the  conditions.  In  this  case  the  system  notifies 
the  user  with) 
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PAUSE  NO  INITIAL  MACHINE  FOUND.  SHALL  I  TRY  AGAIN? 

(Strike  any  key  to  continue.) 

STOP,  ALL  DONE  1 
Input  Files: 

ES  -,  the  example  space  file 
-.CD,  the  compressed  data  files 

MC**.TL,  optional  set  of  transition  letter  sets  which  is  to  be 
generalized 

Output  Files: 

SLPT  or  ES-.LS,  the  listing  file 

TRLS**.TM,  all  intermediate  sets  of  transition  letter  sets 
TRIX**.TM,  index  file  into  TRLS**.TM 

COSTF.TM,  temporary  file  of  costs,  deleted  after  graph  printouts 
FNFF.TM,  temporary  communication  file  between  GZEC  and  PRNT7. 

Error  Conditions: 

•♦•WARNING:  WORD  TOO  LONG*** 

FILE:  filename,  NLETR:  XX 

(The  system  protects  itself  against  overfilling  its  arrays  by 
verifying  that  the  utterance  to  be  processed  is  not  too  long.  The 
examples  which  are  too  long  must  be  edited  using  ESDIT.  Processing 
continues  to  the  next  file.) 

CKST  —  FILE  DOES  NOT  EXIST:  filename 

(If  a  file  is  given  in  the  example  space  which  cannot  be  found  at 
processing  time,  the  error  is  noted  on  the  printer  and  processing 
continues . ) 

CKST  —  UNKNOWN  ERROR:  XX  FILE:  filename 

(This  error  indicates  that  although  the  file  was  found,  it  cannot  be 
accessed  for  some  reason  which  CXST  is  unable  to  remedy.  Refer  to 
the  RDOS  manual  for  a  description  of  error  codes  and  file  status 
codes.  Again  this  error  is  noted  on  the  printer  and  processing 
cont inues . ) 

GWRD  --  UNKNOWN  ERROR:  XX  FILE:  filename 

(If  GWRD  is  unable  to  read  the  compressed  data  file,  it  prints  this 
error  and  takes  the  error  return.  The  existence  of  the  file  is  not 
in  question  when  this  error  is  detected,  but  rather  some  other  file 
data  error  has  occurred.  The  most  likely  cause  is  an  illegal  start¬ 
ing  or  ending  record  specified  for  the  compressed  data  file  resulting 
from  an  error  introduced  in  editing  the  example  space.) 
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4.  RESCUE 

Title:  RESCUE. SV 

Purpose : 

i 

RESCUE  retrieves  a  desired  set  of  transition  letter  sets  from  a  temp¬ 
orary  file  and  writes  it  into  a  machine  file. 

Since  the  final  set  of  transition  letter  sets  for  a  vocabulary  item 
may  not  be  the  best  one  due  to  the  inclusion  of  bad  examples,  all  the 
unique  transition  letter  sets  and  the  Information  to  access  them  are 
saved  in  temporary  files. 

RESCUE  prints  the  desired  set  of  transition  letter  sets  if  requested. 

RESCUE  deletes  the  temporary  files  if  requested. 

Printout : 

A  printout  of  the  set  of  transition  letter  sets  is  provided. 

User  Dialog: 

RESCUE 

ENTER  THE  3-LETTER  SUBDIRECTORY  NAME: 

ENTER  THE  2-DIGIT  VOCABULARY  ITEM  »  (00-28): 

ENTER  THE  RESCUE  INDEX: 

(This  is  the  desired  set  of  transition  letter  sets  as  determined  from 
the  cost  graph  produced  by  GZEC.) 

IS  THE  MACHINE  TO  BE  PRINTED  (  Y  /N  )  ? 

(If  requested,  the  desired  set  of  transition  letter  sets  is  printed 
in  addition  to  being  written  into  the  machine  file.) 

CREATED  FILE 

(The  machine  file  name  is  displayed.) 

ARE  THE  FILES  AND 

TO  BE  DELETED  (Y/N)? 

(The  user  is  queried  whether  the  temporary  files  .are  to  be  deleted. 

The  temporary  file  TR  LS  *  * . TM  saves  all  unique  sets  of  transition 
letter  sets  for  vocabulary  item**. 

og 
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The  temporary  file  TK1X**.TM  contains  the  starting  record  numbers  of 
the  transition  letter  sets  in  TRLS**.T5J  and  the  number  of  transition 
letter  sets  in  each  set  for  vocabulary  item**. 

After  the  temporary  files  are  deleted,  if  so  requested,  the  following 
message  appears.) 

DELETER  FILES  AND 

STOP  PROCESSING  COMPLETE 
Input  Files: 

TRLS** .TM,  the  temporary  file  of  sets  of  transition  letter  sets 
for  vocabulary  item**. 

TRIX**.TM,  the  temporary  file  of  starting  record  numbers  and  the 

number  of  transition  letter  sets  for  the  sets  of  trans¬ 
ition  letter  sets  stored  in  TRLS**.TM  for  vocabulary 
item**. 

Output  Files: 

MC**.TL,  the  machine  file  of  transition  letter  sets  for  vocabulary 
item** . 

Error  Messages: 

CKST  -  FILE  DOES  NOT  EXIST: 

(If  any  of  the  input  files  do  not  exist  for  the  vocabulary  item,  this 
message  is  output  and  the  program  terminates.) 

CKST  —  UNKNOWN  ERROR:  FILE: 

FILE  ALREADY  EXISTS: 

(If  the  machine  file  already  exists  for  the  vocabulary  item,  the 
program  terminates.) 

STOP  ON  ERROR 

(The  program  terminates  for  any  of  the  above  errors.) 

STOP  -  NO  SUCH  MACHINE 

(The  specified  machine  number  does  not  exist.  The  program 
terminates . ) 
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5.  SIGH 


Title:  SIGH.SV 

Purpose : 

SIGH  checks  the  transition  letter  set  files  MC**.TL  created  by  RESCUE 
to  determine  if  the  number  of  transition  letters  in  each  MC**.TL  file 
is  less  than  thirteen.  If  the  number  is  less  than  thirteen,  nothing 
is  done;  if  this  number  is  greater  than  thirteen,  the  transition  let¬ 
ters  with  the  greatest  number  of  "?"  features  are  deleted  until  the 
remaining  number  of  transition  letters  is  smaller  than  thirteen. 

Printout : 

None 

User  Dialog: 

SIGH 

ENTER  2-DIGIT  STARTING  MACHINE  NUMBER 
ENTER  2 -DIGIT  END  MACHINE  NUMBER 

(SIGH  checks  each  transition  letter  set  in  order.  If  a  transition 
letter  set  need  not  be  reduced,  the  message) 

TRANSITION  LETTER  SET  FOR  THIS  ITEM  OK 

(appears  on  the  CRT.  If  a  transition  letter  set  is  reduced,  the 
message) 

TRANSITION  LETTER  SET  FOR  THIS  ITEM  REDUCED 
( appears . ) 

Input  Files: 

MC**.TL,  transition  letter  set  for  Item** 

Output  Files: 

MC**.TL,  reduced  or  unmodified  transition  letter  set  for  item** 
MC**.XY,  non-reduced  transition  letter  set  for  item** 

Error  Messaaes: 

INVALID  ENTRY  -  illegal  machine  number  entered  ' 

FILE  OPEN  ERROR  -  could  not  open  MC**.TL  file. 
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ft.  LOOPER 

Title:  LOOPER.SV 

Purpose : 

LOOPER  finds  the  loop  letter  sets  for  a  particular  vocabulary  item. 
It  provides  a  printout  which  shows  the  sets  of  possible  loop  letter 
sets  for  each  example. 

Printout: 

In  LOOPER  each  example  from  an  example  space  is  printed  and  next  to 
it  the  sets  of  transition  and  loop  letter  sets.  The  loop  letter  set 
printout  is  identical  in  format  to  the  transition  letter  set  format, 
except  that  the  words  "EMPTY  SET"  appear  to  describe  this  condition 
(impossible  in  transition  letter  sets).  The  letter  sets  are 
identified  in  the  far  right-  hand  column.  "T1"  means  transition 
lette-  set  1,  "L2"  means  loop  letter  set  2,  and  so  on.  In  some 
cases,  empty  loop  letter  sets  are  not  shown  because  the  transition 
letter  3et  printout  takes  precedence. 

If  the  example  has  more  than  one  start  point,  this  printout  is  re¬ 
peated  for  the  subsequent  cases. 

The  final  set  of  loop  letter  sets  which  accommodates  at  least  one 
start  point  in  each  utterance  in  the  example  space  is  also  printed. 

User  Dialog: 

LOOPER 

ENTER  NAME  OF  EXAMPLE  SPACE  FILE: 

ENTER  2-DIGIT  VOCABULARY  ITEM  (00-10): 

ENTER  NAME  OF  SUBDIRECTORY 

WHERE  SET  OF  LOOP  LETTER  SETS  IS  TO  RESIDE 
( MUST  BE  3  CHARACTERS ) : 

ENTER  DESCRIPTION  OF  THIS  RUN: 

(The  description  entered  here  is  printed  in  the  header  of  the  LOOPER 
listing. ) 

STOP  LOOPER  IS  FINISHED 
Input  Files: 

ES-,  the  example  space  file 

MC**.TL,  the  set  of  transition  letter  sets  for  this  vocabulary  item 
-.CD,  the  compressed  data  files  specified  by  the  example  space. 
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Output  Files: 

MC**.LP,  the  set  of  loop  letter  sets  for  this  vocabulary  item 
LP *  * . TM ,  a  temporary  file  of  sets  of  loop  letter  sets  for  the  differ¬ 
ent  starting  points  in  the  examples.  This  file  is  deleted  after 
processing  is  complete. 

Error  Conditions: 

LOOP  LETTER  SETS  ALREADY  EXIST  FILE:  filename 

(This  fatal  error  results  when  the  specified  loop  letter  sets  already 
exist.  Delete  or  rename  the  specified  file  before  restarting 
LOOPER.) 

STOP  LOO PER  MUST  HAVE  A  SET  OF  TRANSITION  LETTER  SETS 

(Loop  letter  sets  cannot  be  generated  without  the  corresponding 
transition  letter  sets  (MC**.TL).) 

Other  file  data  error  conditions  are  identical  to  the  CKST  and  GWRD 
errors  described  in  the  GZEC  discussion. 
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7.  REVEXA 


Title:  REVEXA. SV 

Purpose : 

REVEXA  Is  version  A  of  the  revised  research  machine  exerciser.  Es¬ 
sentially,  it  is  a  stripped-down  version  of  REVEX  whose  purpose  is  to 
collect  counter  data.  In  contrast  to  REVEX,  however,  REVEXA  does  not 
allow  any  utterance  with  a  transition  or  loop  letter  set  violation  to 
proceed  to  recognition.  Eor  a  more  complete  description  of  the  oper¬ 
ation  of  REVEXA,  see  the  program  description  for  REVEX. 

Printout : 

This  is  the  same  as  in  REVEX. 

User  Dialog: 

This  is  the  same  as  in  REVEX.  Here,  to  speed  execution,  the 
question: 

DO  YOU  WANT  TO  USE  ilNLY  THE  MACHINES  IN  THE  IfTTERANCE?  (Y/N) 

should  be  answered  "Y",  since  the  other  machines  would  only  contri¬ 
bute  artifacts,  and,  at  this  point,  data  about  artifacts  are  not 
used. 

Input  Files: 

MNSET*  -  the  magic  number  sets  to  be  used  by  REVEXA 

-.CD  -  compressed  data  files 

MC**.TL  -  transition  letter  set  for  item  ** 

MC**.LP  -  loop  letter  set  for  item  ** 
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9.  RVDIT 


Title:  RVDIT. SV 

Purpose : 

RVDIT  is  the  counter  data  file  editor.  RVDIT  creates  from  the 
counter  data  file  for  a  speaker  (<SUD>:  CDAT.RV)  counter  data  files 
for  each  of  the  machine  numbers  (<SUB>:  CDAT**.RV  where  •*  is  the 
machine  number). 

The  user  can  specify  which  counter  data  records  are  to  be  flagged  as 
"real"  in  the  new  files  to  be  created  by  inputting  the  file  <SUB>: 
RVCARDS. 

Printout:  None 

User  Dialog: 

RVDIT 

ENTER  THE  3-LETTER  SUBDI  RECTORY  NAME: 

IS  THERE  A  MACHINE  12  (Y.'N)? 

ARE  THERE  ANY  COUNTER  RECORDS  TO  BE  MODIFIED  ( Y /N ) 7 

(If  the  user  does  not  wish  to  keep  all  of  the  counter  data  records, 
then  file  RVCARDS  must  exist.) 

WARNING  —  FILES  subdirectory :CDAT** . RV  WILL  PE  DELETED 
FOR  ALL  MACHINE  NUMBERS**  (00-11). 

DO  YOU  WISH  TO  CONTINUE  (Y/N)7 

(If  specified,  the  program  terminates  with  STOP  PROCESSING. 

Otherwise,  the  specified  files  are  deleted  and  the  program 
continues. ) 

STOP  ALL  DONE 

Input  Files: 

CDAT.RV  the  counter  data  file 

CIDX.KV  the  counter  index  file 

RVCARDS  contains  the  record  numbers  of  the  counter  data  records  to 
be  flagged  as  "real"  in  the  new  files  to  be  created. 

Output  Files: 

CDAT**.RV  the  counter  data  files  for  machine  number  ** . 


105 


NAVTRAEQUIPCEN  78-C-O 141-1 


9.  COVERT 


Title:  COVERT. SV 

Purpose: 

COVERT  computes  the  covariance  matrix,  median,  delta  lower,  delta 
upper  of  the  counters  for  each  specified  machine.  It  also  calculates 
the  coefficients  of  correlation  for  the  non-diagonal  upper  triangular 
elements  of  the  covariance  matrix. 

Printout: 

For  each  specified  machine,  COVERT  prints  for  each  counter  position 
the  selected  counter  equation,  the  ordered  C-values,  median,  delta 
lower,  and  delta  upper.  It  prints  the  calculated  covariance  matrix 
and  the  coefficients  of  correlation  for  the  covariance  matrix. 

User  Dialog: 

COVERT 

ENTER  THE  3-LETTER  SUBDIRECTORY  NAME: 

ENTER  THE  2-DIGIT  STARTING  MACHINE  NUMBER  (00-15): 

ENTER  THE  2-DIGIT  END  MACHINE  NUMBER  (00-15): 

(COVERT  creates  the  covariance  matrix  files  for  the  machines  in 
ascending  order,  beginning  with  the  starting  machine  number  and 
finishing  with  the  end  machine  number. 

The  current  limits  on  the  machine  numbers  are  0-15). 

CREATED  FILE: 

(The  covariance  matrix  file  name  for  the  machine  is  displayed) . 
STOP-ALL  DONE  ^OLKS! 

Input  Files: 

CDAT**.RV,  the  counter  data  file  for  machine  ** 

Output  Files: 

CM**,  the  covariance  matrix  file  for  machine  **. 

COV.ST,  the  counter  data  statistics  file.  A  record  of  counter  data 
statistics  (medians,  delta  uppers,  delta  lowers,  equation  flags)  is 
written  for  each  machine  processed  by  COVERT. 
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Error  Messages: 

INVALID  ENTRY 

(Invalid  starting  or  end  machine  numbers  were  entered.  Another  input 
is  requested.) 

CKST — FILE  DOES  NOT  EXIST: 

(If  the  input  file  does  not  exist  for  the  machine  being  processed, 
this  message  is  output  and  the  processing  is  skipped  for  this 
machine. ) 

CKST  —  UNKNOWN  ERROR:  FILE: 

FILE  ALREADY  EXISTS: 

(If  the  covariance  matrix  file  already  exists  for  the  machine  being 
processed,  this  message  is  output  and  the  processing  is  skipped  for 
this  machine). 
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10.  INVERT 


Title:  INVERT. SV 

Purpose: 

INVERT  calculates  the  inverted  covariance  matrix  for  each  specified 
machine.  It  also  computes  the  eigenvalues  of  the  covariance  matrix 
for  each  specified  machine  if  so  desired. 

Printout: 

INVERT  prints  the  eigenvalues  of  the  covariance  matrix  when  the 
option  to  compute  the  eigenvalues  is  chosen. 

User  Dialog: 

INVERT 

ENTER  THE  2-DIGIT  STARTING  MACHINE  NUMBER  (00-29): 

ENTER  THE  2-DIGIT  ENDING  MACHINE  NUMBER  (00-29): 

PRINT  EIGENVALUES  OF  COVARIANCE  MATRIX  (1-YES,  0-NO): 

(INVERT  creates  the  inverted  covariance  matrix  files  for  the  machines 
in  ascending  order,  beginning  with  the  starting  machine  number  and 
finishing  with  the  end  machine  number.) 

The  current  limits  on  the  machine  numbers  are  0-29. 

Input  Files: 

CM**,  the  covariance  matrix  -file  for  machine  **. 

Output  Files: 

INCM**,  the  inverted  matrix  file  for  machine  **. 

Error  Messages: 

INVALID  ENTRY 

(Invalid  starting  or  end  machine  numbers  were  entered.  Another  input 
is  requested.) 

FILE  OPEN  ERROR 

(The  covariance  matrix  file  cannot  be  opened.) 
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1 1 .  CROAK 

Title:  CROAK. SV 

Purpose : 

CROAK  calculates  the  delta  anil  mu  values  for  each  counter  data  record 
of  a  machine  number  and  then  orders  and  prints  the  delta  and  mu  val¬ 
ues.  CROAK  also  creates  the  file  RVX.ST  of  statistics  used  by  REVEX. 

Printout: 

For  each  vocabulary  item,  CROAK  prints  out  the  inverted  covariance 
matrix  for  that  item,  its  determinant,  and  the  delta  and  mu  values  in 
unordered  and  in  sorted  form  with  their  computed  mean  and  standard 
deviation.  CROAK  also  plots  the  cumulative  distributions  of  the 
delta  and  mu  values. 

User  Dialog:  » 

CROAK 


ENTER  THE  3-LETTER  SUBDIRECTORY  NAME: 

(Then  the  program  requests  the  user  to  enter  the  data  extraction 
mode:  ) 

ENTER  THE  DATA  STATISTICS  EXTRACTION  MODE 

1  (Mode  1  -  REAL  RECOGNITIONS  WITH  VIOLATIONS) 

2  (Mode  2  -  ALL  REAL  RECOGNITIONS) 

3  (Mode  3  -  REAL  RECOGNITIONS  WITHOUT  VIOLATIONS) 

4  (Mode  4  -  ARTIFACTS  ONLY) 

(If  the  user  enters  anything  but  1,  2,  3,  or  4,  the  message  "INVALID 
ENTRY"  appears,  and  the  user  is  asked  again  for  a  mode  number.) 

(If  any  mode  but  4  is  chosen,  the  user  is  asked  if  he  wishes  to  save 
the  CROAK-generated  statistics:) 

DO  YOU  WISH  TO  SAVE  THE  PROBABILITY  STATISTICS?  ( Y  ?N ) 

(Then  the  program  requests  starting  and  ending  machine  numbers:) 
"ENTER  2 -DIGIT  STARTING  MACHINE  #  (00-15)" 

"ENTER  2-DIGIT  END  MACHINE  #  (00-15)" 

(If  an  illegal  entry  is  made,  the  message  "INVALID  ENTRY"  appears  and 
the  user  is  asked  again  for  starting  and  end  machine  numbers.) 

110 


r ~  * 


NAVTRAEQUI PCEN  7B-C-0 141-1 


When  the  delta  values  have  been  computed  and  sorted,  the  messaqe 

DELTA  VALUES  ARE  STORED  IN  FILE _ appears,  and  when  the  mu  values 

have  been  computed  and  sorted,  the  messaqe 

MU  VALUES  ARE  STORED  IN  FILE _ appears. 

The  messaqe 

STOP-  ALL  DONE  FOLKS  I 

appears  when  the  process inq  is  complete. 

Input  Files: 

INCM**  -  inverted  covariance  matrix  file  for  item  ** 

CDAT**.RV  -  counter  data  file  for  item  ** 

CIDX.RV  -  index  file  for  the  counter  data  file 
COV.ST  -  counter  data  statistics  file 
Output  Files: 

QDAT** . RR  -  file  of  delta  values  for  reals 

QDAT**.AF  -  file  of  delta  values  for  artifacts 

MUDT  *  * . RR  -  file  of  mu  values  for  reals 

MUDT**.AF  -  file  of  mu  values  for  artifacts 

RVX.ST  -  statistics  file  for  REVEX 
Error  Messaqes: 

DELTA  FILE  FOR  THIS  ITEM  ALREADY  EXISTS: 

MU  FILE  ALREADY  EXISTS  FOR  THIS  ITEM 

(Either  QDAT**  or  MUDT**  already  exists  and  should  be  deleted  or 
renamed  before  invokinq  CROAK  aqain.) 

PROBLEM  CREATING  _ 

CROAK  was  not  able  to  create  the  named  file.) 

CKST  —  FILE  DOES  NOT  EXIST 
CKST  --  UNKNOWN  ERROR 
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1  2 .  REVEX 

Title:  REVEX. SV 

Purpose : 

REVEX  is  the  revised  research  machine  exerciser.  It  was  designed  to 
serve  one  of  many  functions  depending  upon  the  particular  subroutines 
loaded  with  it.  The  version  implemented  for  this  phase  of  the  proj¬ 
ect  is  the  counter  data  extraction  version.  It  can  operate  using  any 
number  of  machines.  For  each  utterance,  it  finds  all  machines  which 
go  to  recognition  and  saves  the  counter  data  collected,  that  is  the 
number  of  letters  which  occurred  in  each  transition  and  loop  letter 
set  for  every  machine  which  went  to  recognition.  In  this  version, 
both  loop  and  transition  letter  violations  are  allowable. 

Printout: 

t 

REVEX  provides  three  types  of  printout.  The  first  provides  a  de¬ 
tailed  history  of  the  progress  of  each  copy  of  each  active  machine. 

It  is  read  as  follows.  The  machine  number  is  shown  in  the  heading. 
Machine  0  is  that  constructed  for  the  word  zero.  Machine  10  is  that 
which  recognizes  "point"  and  is  shown  as  "P”  in  printout  3.  When  two 
separate  machines  exist  for  a  single  vocabulary  item,  their  histories 
are  combined  under  the  one  column  for  that  machine.  The  initial  or 
universal  machine  start  is  marked  by  a  "Z”,  while  the  non- initial 
version  starts  are  marked  with  the  typical  "S" . 

The  stage  of  each  machine  is  shown  for  each  letter  in  the  utterance 
(the  letter  number  appears  on  the  left  and  can  be  correlated  with  the 
utterance  printout  given  in  printout  3).  The  stage  is  described  by  a 
single  number  or  letter,  as  shown  in  Table  A2.  (If  the  numerical 
stage  exceeds  9,  only  the  units  digit  is  printed.)  Thus  the  progress 
of  a  copy  of  a  machine  can  be  traced  by  simply  following  the  speci¬ 
fied  print  column.  Note  that  when  the  number  of  copies  of  a  machine 
exceed  the  space  available,  data  for  copies  in  the  next  print  column 
is  shifted  to  the  right  to  allow  data  for  all  copies  to  be  printed. 

Special  symbols  are  used  in  addition  to  the  stage  descriptors.  Their 
meaning  is  shown  in  Table  A3. 

Finally,  a  line  appears  at  the  end  indicating  which  machine  copies  in 
the  final  stage  were  forced  to  recognition  at  the  end  of  the 
utterance. 

Printout  2  lists  relevant  data  about  the  recognitions,  including  the 
loop  letter  violations.  The  "start  order"  values  are  used  to  correl¬ 
ate  these  data  with  printout  3.  The  universal  machine  descriptors 
are  user  selectable.  Her  "2U"  distinguishes  that  machine  from  the 
non-initial  version  ("2"). 

Printout  3  gives  the  utterance  and  maps  the  recognitions  onto  it  in 
order  of  start  time.  The  characters  refer  to  the  machine,  as  dis¬ 
cussed  for  printout  1.  The  symbol  marks  the  time  the  machine 

spent  in  its  last  stage. 
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TABLE  A2.  Stage  and  State  Descriptors  Used  in  REVEX  Type  1  Printout 


Stage  State 

Transition  Loop 

1  1  A 

2  2  B 


10  0  J 

11  1  K 


TABLE  A3.  Meaning  of  Special  Symbols  in  REVEX  Type  1  printout 


Symbol  Symbol  Meaning 

Category 

Machine  start  S  The  violation-free  start  of  a  particular 

copy  of  a  machine.  Not  used  for  universal 
machines.  Always  staqe  1. 

Z  The  violation-free  start  of  a  particular 

copy  of  the  universal  version  of  a  machine. 
Always  stage  1 . 

$  Machine  copy  start  on  a  transition  letter 

violation.  Can  occur  in  stage  1  only  for 
the  first  letter  of  the  utterance.  There¬ 
after,  the  stage  is  one  greater  than 
parent's  stage. 

Parent  copy  0  Marks  the  parent  copy  of  the  "S"  to  the 

left  of  this  copy.  Indicates  parent  is  in 
T  state. 

s  Parent  in  L  state. 

■  Parent  in  the  L  state  with  an  acceptable 

violation  this  letter. 

/  Parent  dropped  due  to  excessive  loop 

viol ations. 
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TABLE  A3.  Meanlnq 

Symbol 

Category 

Violations 


Dropped  copies 


Final  Stage 

Recognition 


of  Special  Symbols  in  KEVKX  Type  1  printout  (Cont) 
Symbol  Meaning 


$  Transition  letter  violation  within  accept¬ 

able  limits  such  that  a  new  copy  (the  off¬ 
spring)  was  started. 

L  state,  letter  not  in  L  (i.e.,  an  L 
violation) . 

X  Copy  dropped  due  to  excessive  L  violations. 

Copy  dropped  due  to  excessive  L  violations 
after  having  sired  an  offspring. 

Copy  not  selected  for  advancement  to  the 
next  stage  because  a)  a  better  copy  was 
advanced;  or  b)  a  copy  in  the  next  stage 
was  better  than  this  copy. 

\  Copy  dropped  because  a  copy  advancing  to  or 

created  in  this  stage  is  a  better  copy. 

*  Copy  dropped  after  recognition. 

?  Copy  in  final  stage  delay,  awaiting 

recognition. 

*  Recognition 
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User  Dialog: 

REVEX 

ENTER  NAME  OK  SITBDI  RECTORY 
WHERE  DATA  FILES  RES  I  PE 
(MUST  HE  3  CHARACTERS): 

ENTER  MODE  OF  DATA  ACQUISITION 
(MODE  1  -  ESG  FORMAT  FILE 
MOPE  2  -  LIVE  UTTERANCE 
MODE  3  -  MAGIC  NUMBER  SET 
MODE  4  -  INDIVIDUAL  -.CD  FILES): 

(The  live  utterance  option  will  be  Implemented  In  a  subsequent  phase. 
If  mode  1  la  selected,  the  system  responds.) 

ENTER  NAME  OK  EXAMPLE  SPACE: 

(If  mode  3  la  selected,  the  system  requests,) 

ENTER  NAME  OF  MAGIC  NIWHER  SET: 

(The  system  searches  for  and  reads  the  machines,  and  explains  the 
brief  pa  vise  to  the  user) 

READING  INDIVIDUAL  MACHINES  .  .  . 

(User  dialog  continues  after  machines  are  found) 

EXTRA  PRINTOUT  TO  $LPT?  (Y  OR  N): 

(This  option  allows  MINIMINT  printout  to  be  directed  to  a  disk  file 
RV.LS  if  "N"  is  entered.  If  this  listing  file  already  exists,  the 
system  asks) 

LISTING  FILE  EXISTS 
MAY  I  DELETE  IT?  (Y  OR  N): 

(The  "N"  response  causes  the  system  to  open  the  existing  listing  file 
for  appending.) 

DO  YOU  WANT  THE  LONG  STATE  PRINTOUT?  (Y  OR  N): 

(A  "N"  response  causes  the  printout  described  as  type  1  to  be 
suppressed. ' 

DO  YOU  WANT  THF.  MACHINE  OVERLAP  PRINTOUT?  (Y  OR  N': 

(A  "N”  response  causes  the  type  )  printout  to  be  suppressed.) 


NAVTRAKOUI  PCEN  ">4-0-0  14  1-1 


FILES  EXIST,  FILES:  SUB:CIDX.RV  SUB-.CDAT.KV 
MAY  I  DELETE  THEM i  (Y  OR  N>: 

(This  message  is  output  If  counter  data  files  already  exist  on  the 
specified  subdirectory.  A  "N"  response  causes  the  system  to  append 
the  new  data  to  the  existing  files,  and  the  messaqe  is  output,) 

APPENDING  TO  EXISTING  FILES 

(When  a  machine  is  found  for  a  vocabulary  item  above  10,  the  system 
asks , ) 

MACHINE  **  FOUND.  IT  IS  ASSUMED  TO  BE  AN  INITIAL  MACHINE. 

ENTER  VOCABULARY  ITEM  TO  WHICH  IT  CORRESPONDS  (0-10): 

(The  system  then  requests  a  descriptor  for  use  in  type  2  and  3 
printouts , ) 

ENTER  2  CHARACTER  DESCRIPTOR  FOR  MACHINE 
(E.G.  '  2U ' ) : 

(Finally,  the  starting  time  for  the  non-initial  version  is 
requested, ) 

ENTER  TIME  •*  SHOULD  STOP  AND  XX  BEGIN: 

(The  system  offers  the  option  of  activating  only  those  machines  which 
are  actually  in  the  utterance.  To  use  all  machines,  enter  "N" ) 

DO  YOU  WANT  TO  USE  ONLY  THE  MACHINES  IN  THE  UTTERANCE?  (Y  OR  N): 

(A  pause  for  system  initialization  follows,  and  then  the  system 
requests , ) 

ENTER  DESCRIPTION  OF  THIS  RUN: 

(The  user  may  enter  a  40  letter  descriptive  strinq  which  is  printed 
in  the  header.) 

ENTER  NAME  OF  ITERANCE  FILE 
(OR  '•'TO  TERMINATE): 

(This  request  is  made  for  every  utterance  if  mode  4  was  selected 
above.  Otherwise  no  further  user  Inputs  are  required.) 

STOP  REVEX  IS  FINISHED 
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Input  Files: 

MC**.TL,  the  sets  of  transition  letter  sets  for  the  vocabulary  items 
of  interest. 

MC**.LP,  the  sets  of  loop  letter  sets 
-.CD,  the  compressed  data  files. 

ES-  or  MNSET*,  example  space  or  number  set  files  if  either  form  of 
entry  was  selected. 

COV.ST,  counter  data  statistics 
INCM**,  inverted  covariance  matrix  files 
RVX.ST,  statistics  file  created  by  CROAK 

Output  Files: 

CIDX.RV,  the  index  file  into  CDAT.RV  in  which  the  data  for  each 
utterance  are  kept. 

CDAT.RV,  the  set  of  counter  data,  start  and  end  times  and  loop  viola¬ 
tions  for  each  machine  which  qoes  to  recognition. 

Error  Messages: 

ILLEGAL  MODE 

ENTER  MODE . 

(This  message  is  printed  when  the  mode  is  not  in  the  range  1  £  mode  < 

4.) 


•♦♦•WARNING:  -.RV  FILES  ARE  NOT  COMPATIBLE 

CURRENT  BYTES:  XX,  OLD  BYTES:  XX 

STOP 

(This  message  occurs  when  an  attempt  is  made  to  append  to  counter 
data  files  created  under  a  different  revision  of  REVEX.  The  differ¬ 
ing  record  sizes  make  it  impossible  to  append.  The  old  counter  files 
must  be  deleted  or  renamed.) 

Filename  DOES  NOT  EXIST 

(This  message  appears  when  a  particular  machine  is  not  found.  REVEX 
assumes  this  machine  is  not  to  be  used  and  continues.  REVEX  can 
operate  with  1  to  13  machines  in  its  present  conf iguration. ) 

NO  SPACE  IS  AVAILABLE  TO  INSERT  A  MACHINE  COPY  FOP. 

VOCABULARY  ITEM  #:  XX 

(This  message  is  output  to  the  printer  when  no  space  is  available  in 
the  machine  copy  data  array.  The  size  of  this  array  must  be  changed 
to  accommodate  extra  copies  if  this  error  is  encountered.) 

CKST  and  GWRD  file  data  errors  are  the  same  as  those  described  for 
GZEC . 
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13.  ADDER 


Title:  ADDER.  SV 

Purpose : 

ADDER  lists  the  transition  ami  loop  letter  set  violations  by  vocabu¬ 
lary  item  for  each  utterance  which  has  been  processed  by  REVEX.  This 
violation  data  vs  stored  in  two  tables:  one  for  real  recognitions 
and  one  for  artifacts. 

Printout : 

If  desired,  ADDER  prints  violation  data  for  each  utterance  processed 
as  well  as  the  two  violation  tables. 

User  Dialog: 

ADDER 

(The  program  then  responds:) 

PROGRAM  ADDER 

DO  YOU  WANT  TO  PLACE  OUTPUT  ON  A  DISK  PILE?  (Y/N) 

(If  the  answer  is  yes,  the  program  responds:) 

ENTER  DESIRED  FILENAME  (1b  CHAR  MAX) 

(If  a  bad  filename  is  chosen,  the  program  prompts:) 

SOMErHING  IS  WRONG  WITH  YOUR  CHOICE  OF  FILENAME. 

CHOOSE  ANOTHER. 

Then , 

ENTER  RELEVANT  COMMENT  (40  CHAR  MAX) 

ENTER  DISK  CONTAINING  CDAT,  CIDX  DATA  FILES  (.3  CHAR) 

( e . q . ,  enter  DP2 ' 

ENTER  SUBDIRECTORY  LOCATION  OF  CDAT,  CIDX  DATA  FILES  (3  CHAR) 

DO  YOU  WANT  THE  LONG  FORM  PRINTOUT?  (Y'N) 

(ADDER  then  proceeds  to  process  utterances  one  by  one,  storing 
violation  data  in  the  two  violation  matrices.' 
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Input  Files: 

CDAT.RV  counter  data  file  from  REVEX 
CIDX.RV  index  file  for  the  counter  data  file 
Output  Files: 

LTVF.MM  file  containing  violation  table  user  by  DEALER 
Error  Messages: 

STOP  PROBLEM  WITH  LTVF.MM 

(There  was  a  problem  opening  the  file  LTVF.MM) 

KARMA  —  FILE  DOES  NOT  EXIST 

(Either  CD AT. RV  or  CIDX.RV  does  not  exist) 

KARMA  —  UNKNOWN  ERROR 

(Problem  with  status  of  CDAT.RV  or  CIDX.RV  files) 
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14.  AVRAJ 


Title:  AVRAJ. SV 

Purpose : 

AVRAJ  computes  average  word  length  for  real  recognitions. 

Printout : 

AVRAJ  prints  out  the  average  word  length  for  each  machine. 

User  Dialog: 

AVRAJ 

(The  program  responds: ) 

PROGRAM  AVRAJ 

ENTER  DISK  CONTAINING  CDAT,  CIDX  DATA  FILES  (3  CHAR) 

ENTER  SUBDIRECTORY  LOCATION  OF  CDAT,  CIDX  DATA  FILES  (3  CHAR) 

(The  program  then  proceeds  to  run  through  the  CDAT.RV  file, 
calculating  average  word  length  for  each  vocabulary  item  over  all 
real  recognitions  of  that  item.  When  this  process  is  complete,  the 
message  below  appears.) 

AVERAGE  WORD  LENGTHS  ALSO  EXIST  ON  3INARY  FILE:  AVRWRD.ST 
STOP  AVRAJ  IS  FINISHED. 

Input  Files: 

CDAT.RV  Counter  data  file  created  by  REVEX 
CIDX. RV  Index  file  to  CDAT.RV 
Output  Files: 

AVRWRD.ST 

File  of  average  word  lengths 
Error  Messages: 

KARMA  —  FILE  DOES  NOT  EXIST 

(Either  CDAT.RV  or  CIDX.RV  file  does  not  exist  on  the  specified 
subdirectory . ) 
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KARMA  —  UNKNOWN  ERROR 

(Problem  with  status  of  CDAT.RV  or  CIDX.RV  files.) 

STOP  PROBLEM  OPENING  AVRWRD.ST 

(There  was  a  problem  opening  AVRWRD.ST.) 
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1  *>.  CRAr 


Title :  CRAP.SV 

Purpose : 

CRAP  determines  the  critical  wtl  dsmvt.u  ion  factors  for  each  vocab¬ 
ulary  Item,  going  through  each  utterance  privasael  by  WEVEX  to  check 
associations  occurring  In  the  utterance  at  nine  levels  of  overlap. 

Pr In tout : 

for  each  of  the  nine  levels  of  overlap,  CRAP  prints  out  a  matrix  of 
overlap  parameters.  Also,  the  critical  association  parameters  for 
each  vocabulary  item  are  printed. 


User  Dialog: 


CRAP 

(And  the  program  responds :  ) 

PROGRAM  CRAP 

DO  YOU  WANT  TO  PLACE  OUTPUT  ON  A  DISK  PILE?  (V  HI 

(If  the  user  answers  affirmatively,  the  program  reguests  a  filename. 
The  disk  file.  If  chosen,  receives  all  output  that  would  otherwise  go 
to  the  printers.) 

(Then  the  program  asks  for  the  location  of  the  COAT.RV  and  CIPX.RV 
files:  ) 

ENTER  DIRK  CONTAINING  COAT ,  CIPX  DATA  FILES  (l  CHAR): 

(e.g.,  enter  "DP2 " ) 

ENTER  SUBDIRECTORY  LOCATION  OK  CPA T,  CIPX  DATA  FILES  <1  CHAR) 

(The  program  then  asks  for  the  machine  typos  of  machines  11  and  12:1 


ENTER 

VOCABULARY 

TYPE 

FOR 

MACHINE 

1 1 : 

( I .  E. 

•  2  •  or  *  4  •  ) 

ENTER 

VOCABULARY 

TYPE 

FOR 

MACHINE 

12: 

(  I.E. 

or  ’  4  1  ) 

(An  entry  of  -  1  should  he  made  If  the  machine  Is  not  belnn  used.) 
(Then,  at  the  end:) 

CRITICAL  ASSOCIATION  PARAMETERS  HAVE  BEEN  OUTPUT  To  CAP. ST 
STOP  CRAP  FINISHED 
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Input  Files: 

AVRWRD.ST  -  file  of  average  word  lengths 
CDAT.RV  -  counter  data  file 

CIDX.RV  -  index  file  for  the  counter  data  file 
Output  files: 

CONGAP  -  file  for  contiguous  real  gap  matrix 
CAP. ST  -  file  of  critical  association  parameters 

Error  Messages: 

STOP  PROBLEM  OPENING  OUTPUT  FILE  -  for  disk  file  output 
STOP  PROBLEM  OPENING  CONGAP 

STOP  PROBLEM  OPENING  AVERAGE  WORD  FILE  "AVRWRD.ST" 

• 

SOMETHING  IS  WRONG  WITH  YOUR  CHOICE  OF  FILENAME.  CHOOSE  ANOTHER 

If  a  disk  file  output  is  chosen  and  the  filename  given  is  not 
acceptable,  another  name  is  asked  for. 
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16.  GAPSTER 


Title:  GAPSTER. SV 

Purpose: 

GAPSTEK  is  responsible  for  determining  the  association,  gap  and  delay 
values;  GAPSTER  creates  the  gap  matrix  GAP.DT  and  the  QASM  matrix 
needed  in  the  MIND  file. 

Printout: 

GAPSTER  prints  out  time  gap  statistics  for  real  and  artifact  recogni¬ 
tions  as  well  as  the  gap  matrix  and  the  £ASM  matrix. 

User  Dialog: 

GAPSTER 

PROGRAM  GAPSTER 

DO  YOU  WANT  TO  PLACE  OUTPUT  ON  A  DISK  FILE?  (Y/N) 

(If  the  user  answers  affirmatively,  the  program  requests  a  filename. 
If  the  filename  chosen  is  bad,  the  program  asks  for  another  name.) 

(Then,  just  as  in  CRAP,  the  program  asks  for  the  location  of  the 
CDAT.RV  and  C1DX.RV  data  files. 

If  these  data  files  are  found,  the  machine  types  of  machines  11  and 
12  are  requested.) 

ENTER  VOCABULARY  ITEM  FOR  MACHINE  11 
ENTER  VOCABULARY  ITEM  FOR  MACHINE  12 
(If  a  machine  is  not  being  used,  enter  "-1" 

The  program  then  asks  for  the  critical  association  factor: ) 

ENTER  CRITICAL  ASSOCIATION  FACTOR 
(The  recommended  response  here  is  "1.0") 

(At  this  point,  the  user  is  asked  to  enter  the  total  number  of 
machines  to  be  used.) 

ENTER  TOTAL  NUMBER  OF  MACHINES  TO  BE  USED 

(For  example,  if  machines  0-10  were  being  used,  the  response  would  be 

"1  1") 

(After  running  a  few  minutes,  the  program  halts  and  requests:) 
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ENTER  REAL  STD. DEV. SPREAD  FACTOR  FOR  GAP  MATRIX 
(The  recommended  reply  here  is  "1.0") 

(A  little  bit  later  the  user  is  asked;) 

DO  YOU  WANT  THE  QUARTILE  AND  MEDIAN  CALUCLATIONS?  (Y/N) 

(These  calculations  are  not  required  and  may  be  omitted.) 

(Then,  at  the  end: ) 

STOP  GAPSTER  IS  FINISHED 
Input  Files: 

CONGAP  -  file  of  contiguous  real  gap  matrix 
Output  Files: 

GAPMAX  -  file  holding  maximum  gap  value 
GAP.DT  -  file  holding  gap  matrix 
QASM.DT  -  file  holding  Q ASM  matrix 
Error  Messages: 

STOP  PROBLEM  OPENING  OUTPUT  FILE  -  for  disk  listing  file 
STOP  PROBLEM  WITH  OPENING  CONGAP 
STOP  PROBLEM  OPENING  GAPMAX 
STOP  PROBLEM  OPENING  CAP. ST 

FILE  DOES  NOT  EXIST  -  the  comprised  data  file  in  question  does  not 
exist 


STOP  ri.^rJLEM  OPENING  QASTMP  -  cannot  open  QASM.DT 
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17  '1H.  SORTRA,  SORTRB 
Title:  SORTRA. SV,  SORTRB. SV 

Pur  pose : 

SORTRA/ SORTRB  3orts  the  trie  G  A  PM  AX  /  CON GA  P ,  puttinq  the  entries  in 
ascendinq  numerical  order. 


Pr in tout : 

SORTRA  prints  a  plot  of  the  ordered  values  of  the  file 
GAPMAX,  and  SORTRB  does  the  same  thinq  for  the  file  CCNGAP . 


Vlser  Diaioq: 


SORTKA  (SORTRB) 

( SORTRA /SORTRB  proceeds  to  sort  the  entries  in  the  file  GAPMAX/CONGAF 
and  then  plot  the  sorted  values. 

Input  Files: 

GAPMAX  -  for  SORTRA 

CONGAP  -  for  SORTRB 

Error  Messaqes: 


None 
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1 9.  MUTE 

Title:  MUTE. SV 

Purpose : 

MUTE  computes  the  L-counter  parameters  MDLAO ,  MDLA 1 ,  and  MDLA2. 
Printout: 

MUTE  prints  out  the  MDLAO,  MDLA  1,  and  MDLA 2  values  for  each  machine, 
as  well  as  the  parameters  PQ  for  reals  and  artifacts  and  for 
reals  and  artifacts. 

User  Dialog: 

MUTE 

ENTER  2-DIGIT  STARTING  MACHINE  NUMBER 
ENTER  2 -DIGIT  END  MACHINE  NUMBER 

(MUTE  proceeds  to  compute  the  MDLA**  values  for  each  machine  begin¬ 
ning  with  the  starting  number  machine.) 

Input  Files: 

MUDT**.RR  -  file  of  values  for  reals  of  item  ** 

MUDT**. AF  -  file  of  values  for  artifacts  of  item  ** 

Output  Files: 

LOOPY  -  file  of  MDLA**  values  for  al 1  machines 
QLSTATS  -  file  of  PQ  and  values  for  all  machines 
Error  Messages: 

TOO  MANY  MU  VALUES  -  the  number  of  values  exceeds  600 
FILE  OPEN  ERROR  —  one  of  the  MUDT**  files  cannot  be  opened 
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20.  GLOVE 


Title:  GLOVE. SV 

Purpose : 

GLOVE  is  a  least  squares  rout  trie  designed  to  fit  a  curve  through  the 
observed  points  of  the  cumulative  distribution  of  the  Of  quality 
function  values  for  both  real  and  artifact  recognitions. 

Printout : 

GLOVE  prints  out  five  coefficients  for  each  real  vocabulary  item  and 
five  coefficients  for  each  artifact  vocabulary  item.  These  coeffici¬ 
ents  are  the  "adjustable  parameters"  determined  so  that,  with  these 
values  vised  as  coefficients  in  the  general  functional  form,  the  fit 
to  a  particular  set  of  data  points  Is  best  in  the  sense  of  least 
squares. 

User  Dialog: 

GLOVE 


ENTER  STARTING  MACHINE  NUMBER  (0-29) 

ENTER  END  MACHINE  NUMBER  (0-29) 

(GLOVE  then  proceeds  to  compute  coefficients  for  each  vocabulary 
item,  real  and  artifact,  doing  reals  first  in  ascending  order  then 
artifacts  in  ascendinq  order.) 

Input  Files: 

QOAT** . RR  -  file  of  real  delta  values  for  item  ** 

<JDAT**.AF  -  file  of  artifact  delta  values  for  item  ** 

Output  Files: 

QDFT**.RR  -  file  of  coefficients,  median  delta  and  range  for  reals 

QDFT**.AF  -  file  of  coefficients  and  median  delta  for  artifacts 

APR  -  file  holding  number  of  real  deltas  for  each  vocabulary 

item 

APA  -  file  holding  number  of  artifacts  deltas  for  each  vocabu¬ 

lary  item 
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Error  Messages: 

FILE  STATUS  ERROR  FOR  ITEM:  _ 

TOO  MANY  DELTA  VALUES  FOR  THIS  ITEM 

(The  number  of  deltas  exceeds  available  array  size. 
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21.  TAILOR 

Title:  TAILOR. SV 

Purpose : 

TAILOR  calculates  T-counter  quality  function  values.  TAILOR  reads 
coefficients  and  a  range  of  values  to  fit  from  files  QDAT**.RR  and 
QDAT** • AF  (where  **  is  the  machine  type),  then  fits  a  quadratic  to 
the  ratio  of  the  real  and  artifact  data.  The  coefficients  of  the 
fitted  curve  are  then  transformed  into  a  MEX  usable  form  and  written 
into  the  file  WHAT. 

Printout : 

The  real  coefficients  from  QDAT**.RR. 

The  range  at  delta  values  to  be  used. 

The  coefficients  for  artifacts. 

The  range  of  delta  values  actually  used  in  the  fit. 

The  determinant  of  the  matrix  used  in  the  least  squares  fit. 

The  coefficients  of  the  fitted  curve. 

A  plot  of  the  fitted  curve  and  data  points. 

The  MDTA*  values. 

User  Dialog: 

TAILOR  [beginning  machine  number/B]  [ending  machine  number/E] 

If  /B  option  is  omitted,  beginning  machine  number  is  assumed  to  be  0. 

If  /E  option  is  omitted,  ending  machine  number  is  assumed  to  be  10. 

TAILOR  will  type  the  matrix  generated  by  the  least  squares  fit  for 
each  machine  type  processed. 

Input  Files: 

Q0AT**.RR,  coefficients  for  real  recognition 

QDAT**.AF,  coefficients  for  artifacts,  where  **  goes  from  00  to  10 
Output  Files: 

WHAT  T-counter  quality  function  values 
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Error  Messages: 

STOP  -  OPEN  ERROR  -  QDAT**.- 
STOP  -  READ  ERROR  -  COAT**.- 
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2  2.  BUILDER 

Title:  BUILDER. SV 

Purpose  : 

BUIILDER  builds  the  machine  data  file  MDFL.MM  from  the  input  files 
LOOPY  and  WHAT  created  by  MUTE  and  TAILOR  respectively. 

Printout : 

None 

User  Dialog: 

BUILOER 

(BUILDER  then  proceeds  to  read  the  files  LOOPY  and  WHAT  and  then 
merge  them  to  create  MDFL.MM.  When  this  is  complete,  the  message 

MDFL.MM  CREATED 

appears  at  the  CRT.) 

Input  Files: 

LOOPY,  file  of  MDLA*  values  created  by  MUTE 
WHAT,  file  of  MDTA*  values  created  by  TAILOR 
Output  Files: 

MDFL.MM,  machine  copy  data  file  needed  by  DEALER 
Error  messages: 

PROBLEM  OPENING  LOOPY  -  cannot  open  LOOPY 
PROBLEM  OPENING  WHAT  -  cannot  open  WHAT 

LOOPY  AND  WHAT  INCOMPATIBLE  -  the  number  of  entries  in  LOOPY  is  not 
the  same  as  the  number  of  entries  in  WHAT.  This  terminates  the 
program. 

( 
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2  3.  DEALER 


Title:  DEALER. SV 

Purpose : 

DEALER  pulls  together  the  various  files  created  bv  CRAP,  GAPSTER,  and 
BUILDER  to  create  the  file  MIND. VP. 

Pr intout : 

None 

User  Dialoq: 

DEALER 

PROGRAM  DEALER 

ENTER  REVISION  NUMBER  FOR  THIS  JOB: 

(The  current  revision  number  is  "•0") 

(The  program  then  asks  for  number  of  vocabulary  items,  and  the  disk 
and  subdirectory  containing  the  data.) 

ENTER  NUMBER  OF  VOCABULARY  ITEMS  (0-13) 

ENTER  "DISK:  SUBDIR":  (9  CHAR.  MAX) 

(For  example,  the  user  might  enter  "DP2:ABC") 

It'  there  are  universal  machines  (machines  11  and  12'  the  program 
requests 

ENTER  VOCABULARY  ID  AND  END  TIME  FOR  MACHINE  _  SEPARATED  BY  COMMA 

(So,  for  example,  the  user  might  enter  “2,25"  for  machine  11,  indi¬ 
cating  that  me  'Line  11  is  universal  machine  for  vocabulary  item  2, 
and  that  the  end  time  for  this  item  is  25.) 

Finally,  if  no  errors  occur,  the  user  is  asked  for  comment  to  add  to 
the  data  file. 

Then 

LOOKS  LIKE  WE  MADE  IT,  FOLKS 

appears,  and  the  MIND  file  is  complete,  except  for  the  play  factors. 
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Input  Files: 

LVTF.MM  -  transition/loop  letter  set  violation  table  file 

MC**.TL  -  transition  letter  set  files 

MC**.LP  -  loop  letter  set  files 

COV.ST  -  covariance  statistics  file 

INCM*  *  -  inverted  covariance  matrix  files 

RVX.ST  -  statistics  file  created  by  REVEX 

MDF.MM  -  machine  data  file  created  by  ’UILDER 

GAP.DT  -  gap  matrix  file 

CAP. ST  -  critical  association  parameter  file 
Output  File: 

MIND. VD  -  incomplete  MIND  file 
Error  Messages: 

ERROR  NO.  _  OCCURRED  IN  STAT  CALL  FOR  FILE  _ 

(The  status  of  the  named  file  is  bad.  The  RDOS  error  code  is  used.) 
STOP  —  TOO  MANY  STAGES 

(The  number  of  transition  letter  sets  is  too  large.) 

STOP  PROBLEM  WITH  COV.ST 

STOP  PROBLEM  WITH  RVX.ST 

STOP  PROBLEM  WITH  A  INCM**  FILE 

STOP  PROBLEM  WITH  MDFIL  -  MDFL.MM  file  is  bad 

STOP  PROBLEM  WITH  LTVF 

STOP  PROBLEM  WITH  QASM.DT 

STOP  PROBLEM  WITH  GAP.DT 

(Usually  an  error  of  this  kind  simply  means  that  the  named  file  does 
not  exist  on  the  subdirectory  in  question.) 

"MIND"  IS  WARPED  -  status  of  output  file  is  bad,  enter  another 


GIVE  ME  A  NEW  FILENAME 
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24.  PHEW 

Title:  PHEW. SV 

Purpose : 

PHEW  writes  the  three  play  factors  to  the  end  of  the  MIND  file. 
Printout: 

PHEW  prints  out  the  a  priori  costs  for  each  machine 
User  Dialog: 

PHEW 

ENTER  NUMBER  OF  VOCABULARY  ITEMS 

(PHEW  opens  the  MIND.VD  file  for  appending,  computes  a  priori  costs 
for  each  machine  and  writes  these  costs  to  the  end  of  the  MIND  file 
together  with  the  three  gap  matrix  play  factors.  When  this  is 
accomplished  the  message 

MIND  HAS  BEEN  CREATED 

appears  on  the  CRT. ) 

Input  Files: 

MIND.VD,  the  data  file  created  by  DEALER 

APR,  the  file  holding  number  of  real  deltas  for  each  machine 
APA,  the  file  holding  number  of  artifact  deltas  for  each  machine 
Output  Files: 

MIND.VD,  the  complete  MIND  file 
Error  Messages: 

PROBLEM  OPENING  MIND.VD  -  the  file  created  by  DEALER  cannot  be 
opened. 


VOCABULARY  ITEM  NUMBER  MISMATCH  -  the  number  of  vocabulary  items 
input  does  not  match  the  number  used  in  DEALER. 
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2S.  ESDIT  ( Auxiliary) 


Title:  ESPIT.SV 

Purpose : 

ESDIT  Is  the  example  space  file  editor.  It  provides  the  capability 
to  change  Individual  start/stop  values  in  the  example  space  using  an 
ESG  or  GENRL1Z  printout.  For  a  GENRLIZ  printout,  the  startinq  record 
number  and  the  end  record  number  of  the  compressed  data  entered  by 
the  user  must  be  offset. 

Printout: 

ESDIT  produces  a  printout  of  the  edit  showinq  the  element  in  the 
example  space  which  was  chanqed,  and  the  old  and  new  values  associ¬ 
ated  with* it. 


User  Dialog: 

ESDIT 

ENTER  NAME  OF  EXAMPLE  SPACE  FILE: 

ARE  YOU  EDITING  FROM  AN  ESG  PRINTOUT?  (Y  OR  N): 

(If  the  user  answers  no,  then) 

ARE  you  EDITING  FROM  A  GENRLIZ  PRINTOUT?  (Y  OR  N): 

ENTER  RECORD  NUMBER: 

FILE:  , STARTING  RECORD:  , ENDING  RECORD: 

(This  is  a  statement  of  the  current  information  in  the  specified 
record  number.) 

ENTER  NEW  STARTING  RECORD: 

ENTER  NEW  ENDING  RECORD: 

ARE  THERE  OTHER  CHANGES  TO  THIS  EXAMPLE  SPACE  FILE?  (Y  OR  N): 

(If  there  are  further  changes,  inputs  of  record  number,  new  starting 
record,  and  new  ending  record  are  requested. ) 

DO  YOU  WANT  TO  PROCESS  ANOTHER  EXAMPLE  SPACE?  (Y  OR  N): 

(If  another  example  space  is  to  be  processed,  the  user  dialoq  is 
repeated. ) 
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26.  ESGDIT  (Auxiliary) 


Title:  ESGDIT. SV 

Purpose : 

This  stand  alone  program  operates  on  an  existing  example  space  file 
to  produce  a  new  example  space  file  in  which  all  utterances  beginning 
with  the  vocabulary  item  specified  are  omitted. 

Printout: 

A  hardcopy  listing  of  utterances  used  and  those  omitted  is  produced. 
User  Dialog: 

ESGDIT 

ENTER  NAME  OE  EXAMPLE  SPACE: 

ENTER  VOCABULARY  ITEM  (0...P): 

ENTER  NEW  EXAMPLE  SPACE  NAME: 

OLD  FILE  DESCRIPTION:  file  description 
ENTER  NEW  FILE  DESCRIPTION: 

STOP  ESGDIT  IS  FINISHED 

Input  Files: 

ES  the  example  space  file 
Output  Files: 

ES  the  new  example  space  file. 

Error  Messages: 


FILE  ALREADY  EXISTS,  FILE:  file  name 
ENTER  NEW  EXAMPLE  SPACE  NAME. 
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27.  GASP  (Auxiliary) 


Title:  GASP. SV 

Purpose : 

GASP  (the  Great  American  Speech  Printout  routine)  was  created  during 
the  closing  moments  of  the  LCSR  project  phase  0  for  the  final  report. 
For  each  specified  machine  number,  GASP  prints  the  transition  letter 
sets  or  the  merged  transition  and  loop  letter  sets. 

Printout : 

GASP  prints  out  either  transition  letter  sets  or  merged  transition 
and  loop  letter  sets  for  each  specified  machine  number. 

User  Dialog: 

GASP 

ENTER  THE  3-LETTER  SUBDIRECTORY  NAME: 

ARE  THE  TRANSITION  AND  LOOP  LETTER  SETS  TO  BE  MERGED  IN  THE  PRINTOUT 
(Y/N) 7 

(If  the  letter  sets  are  not  merged,  only  the  transition  letter  sets 
are  printed  for  the  machine  number.) 

ENTER  THE  STARTING  MACHINE  NUMBER: 

ENTER  THE  END  MACHINE  NUMBER: 

(GASP  prints  the  machines  in  ascending  order,  beginning  with  the 
starting  machine  number  and  finishing  with  the  end  machine  number. 

The  current  limits  on  the  machine  numbers  are  0-15.) 

STOP -ALL  DONE 

Input  Files: 

MC**.TL,  Transition  letter  set  file  for  the  machine  number  ** 

MC**.LP,  Loop  letter  set  file  for  the  machine  number  ** 

Output  Files: 

N/A 

Error  Messages: 

INVALID  ENTRY 

(Invalid  starting  or  end  machine  numbers  were  entered.  Another  input 
is  requested.) 
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27.  GASP  (Auxiliary) 


Title:  GASP. SV 

Purpose : 

GASP  (the  Great  American  Speech  Printout  routine)  was  created  durinq 
the  closing  moments  of  the  LCSR  project  phase  0  for  the  final  report. 
For  each  specified  machine  number,  GASP  prints  the  transition  letter 
sets  or  the  merged  transition  and  loop  letter  sets. 

Printout : 

GASP  prints  out  either  transition  letter  sets  or  mer  jed  transition 
and  loop  letter  sets  for  each  specified  machine  number. 

User  Dialog: 

GASP 

ENTER  THE  3-LETTER  SUBDIRECTORY  NAME: 

ARE  THE  TRANSITION  AND  LOOP  LETTER  SETS  TO  BE  MERGED  IN  THE  PRINTOUT 
( Y/N )  7 

(If  the  letter  seta  are  not  merged,  only  the  transition  letter  sets 
are  printed  for  the  machine  number.) 

ENTER  THE  STARTING  MACHINE  NUMBER: 

ENTER  THE  END  MACHINE  NUMBER: 

(GASP  prints  the  machines  in  ascending  order,  beginning  with  the 
starting  machine  number  and  finishing  with  the  end  machine  number. 

The  current  limits  on  the  machine  numbers  are  0-15.) 

STOP -ALL  DONE 

Input  Files: 

MC**.TL,  Transition  letter  set  file  for  the  machine  number  ** 

MC**.LP,  Loop  letter  set  file  for  the  machine  number  ** 

Output  Files: 

N/A 

Error  Messages: 

INVALID  ENTRY 

(Invalid  starting  or  end  machine  numbers  were  entered.  Another  input 
is  requested.) 


NAVTRAEQUI PCEN  78-C-0141-1 
CKST — FILE  DOES  NOT  EXIST: 

(If  any  of  the  input  files  do  not  exist  for  machine  number  ««,  this 
message  is  output  and  no  machine  printout  is  made.) 

CKST — UNKNOWN  ERROR;  FILE: 
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28.  GWIZ  (Auxiliary) 


Titles  GWIZ. SV 
Purpose : 

GWIZ  is  an  auxiliary  investigative  program  which  delineates  the  words 
within  an  utterance  using  the  given  WIZARD  statistics.  It  prints  the 
compressed  data  file  blatantly  noting  the  delineations.  The  com¬ 
pressed  data  files  to  be  used  are  listed  in  SUB:  GWIZ. CD. 

Printout  s 

GWIZ  prints  the  compressed  data  files  noting  the  delineations. 

User  Dialog: 

GWIZ 

ENTER  THE  3-LETTER  SUBDIRECTORY  NAME: 

STOP 

Input  Piles: 

GWIZ. CD,  file  containing  the  compressed  data  file  names. 

WIZ.ST,  file  of  length  and  stretch  factors  which  resides  on  the  main 
directory. 

-.CD,  the  specified  compressed  data  files. 

Output  Files: 

None 

Error  Messages: 

CKST— FILE  DOES  NOT  EXIST 

CKST  —  UNKNOWN  ERROR:  FILE: 

(If  a  file  error  is  detected  on  file  WIZ.ST,  then  GWIZ  terminates 
with  the  message 

STOP  -  FILE  WIZ.ST  DOES  NOT  EXIST 

GWIZ  also  terminates  on  an  error  from  file  SUB:  GWIZ. CD.  GWIZ  out¬ 
puts  the  error  message  and  continues  processing  on  an  error  from  a 
compressed  data  file.) 
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29.  MEND  (Auxiliary) 


Title:  MEND. SV 

Purpose: 

MEND  is  an  auxiliary  program  which  creates  example  space  for  all  the 
vocabulary  items  from  the  handmarked  input  data  obtained  from  the 
GWIZ  printout.  These  are  the  example  spaces  to  be  input  to  GZEC. 

Programs  OJIZ  and  MEND  bridge  the  need  to  execute  WIZARD  and  ESG 
given  a  WIZARD  statistics  file. 

Printout : 

None 

User  Dialog: 

MEND 

ENTER  THE  3-LETTER  SUBDIRECTORY  NAME: 

WARNING — FILES  ES$<SUB>$**  WILL  BE  DELETED  FOR  ALL  MACHINE 
NUMBERS**  (00-11). 

DO  YOU  WISH  TO  CONTINUE  (Y/N)? 

(If  specified,  the  program  terminates  with  STOP  PROCESSING.  Other¬ 
wise,  the  specified  files  are  deleted  and  the  program  continues). 

STOP  ALL  DONE 

Input  Files: 

MEND.WD,  file  of  handmarked  input  data  to  create  the  example  spaces. 

The  handmarked  input  data  file  is  organized  as  follows:  Two  lines 
are  associated  with  data  coming  from  each  -.CD  file.  The  first  line 
(beginning  in  column  1)  contains  the  number  of  words  in  the  utterance 
and  the  -.CD  filename.  (For  example,  the  utterance  "1234"  might  pro¬ 
duce:  '4,  LHN:A1234.CD  where  LHN  is  the  subdirectory  which  holds  the 

-.CD  files  and  the  A1234.CD  is  the  relevant  condensed  data  file.)  The 
second  line  entry  has  the  format:  machine  number,  beginning  record, 
end  record  separated  by  commas  for  each  word  in  the  utterance. 

Output  Files: 

ESSSUBS**,  example  spaces  for  machine**,  **  *  0,11 
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Error  Messages: 

CKST  -  FILE  DOES  NOT  EXIST: 

CKST  —  UNKNOWN  ERROR:  FILE: 

(If  there  Is  an  error  from  the  input  file  MEND.WD,  the  program 
outputs  the  message  and  terminates  with) 

STOP  ON  ERROR 
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A. 9  FILE  DESCRIPTION  OF  VDGS  USER-CREATED  FILES 

File  description  of  VDGS  user-created  files  are  presented  on  the  following 

pages. 
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Name:  Magic  number  sets 

File:  MNSET*  where  *  is  a  1-character  number  set  identifier. 


Description:  The  card  images  for  the  number  sets  Include  the  number 

spoken  and  the  file  name  of  the  compressed  data  f  le  as 
shown  below.  The  digits  plus  the  word  "point”  comprise  the 
base  vocabulary. 

Note  that  the  list  includes  2  to  4  word  numbers  and  that 
each  list  has  the  following  properties: 


1.  Every  digit  occurs  15  times;  the  word  "point"  occurs  14 
times. 

2.  Every  digit  occurs  first  4  times  and  last  4  times;  so 
does  the  word  "point". 

3.  Every  transition  between  two  digits  (e.q^.  67,  68,  99 
etc.)  and  between  a  digit  and  the  word  "point,"  and 
between  the  word  "point"  and  a  digit,  occurs  exactly 
once  in  each  set. 

For  data  collection  purposes,  each  set  is  augmented  with  the 
eleven  base  vocabulary  words;  hence  each  set  consists  of  55 
numbers  (including  the  single  word  "point"). 

Created  By:  N/A 

Format:  Randomly  organized,  256  words/record 


Columns 


Contents 


1  -  4  Number,  right- justified 

5-6  Blanks 

7-12  File  name,  ending  with  right- justified 


I 
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Name:  Card  image  file  for  the  counter  data  file  editor 

File:  RVCARDS 

Description:  The  card  images  for  the  counter  data  file  editor  contain  the 

counter  data  record  numbers  to  be  kept  in  the  new  counter 
data  file  to  be  created. 

Created  By:  N/A 

Format:  Randomly  organized,  256  words/record 

Note:  The  counter  data  record  numbers  to  be  kept  are  written  from 

right  to  left  in  ascending  order. 


Columns 


Contents 


1-2  The  number  of  entries  on  this  card, 

right- justifed  (no  more  than  10  entries. 
Otherwise,  it  is  blank  and  10  entries 
are  on  the  card.) 

The  last  card  contains  -1  and  the 
remainder  of  the  card  is  blank. 

3  -  4  Blanks 

5-8  Counter  data  record  number  to  be  kept, 

right- justified,  in  ascending  order 

9-10  Blanks 

11-14  Blanks  or  counter  data  record  numbers  to 

17  _  2Q  be  kept,  right- justified  for  each  entry, 

in  ascending  order 

23  -  26 


29  -  32 
35  -  38 
41-44 
47  -  50 
43  -  56 
59  -  62 
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N*m«:  Ftle  of  hand-cutting  results 

File:  MEND. WO 

Description:  This  user-crea tail  file  holds  all  data  gleaned  from  the  hand- 

cutting  procedure  including  the  number  of  words  in  each 
utterance  ami  the  start  and  end  times  for  each  word. 

Created  Ily:  User 

Format:  Randomly  organized,  2Sf>  words/record 


Columns  Contents 

line  1  -  1-2  number  of  words  in  utterance,  followed  by  a 

comma 

3-14  compressed  data  file  name,  including  three 

letter  subdirectory  name,  colon,  data  file 
name  plus  -.CD  extension 

line  2  -  1-22  machine  number,  beginning  time,  end  time 

separated  by  commas  for  each  word  in  the 
utterance 


A  typical  entry  for  ttie  compressed  data  file  A1234.CD  might  then  look  like 
this: 

4  ,  A 1 2  3  4 . CD 

1, 1,25,2,20,50,3,45,80,4,70, 100 
"I”  "2"  ”3”  ”4" 


t  4" 
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NanM*!  RESCUE  Indexes  identifying  numbers  of  best  transition  letter 

sets 

File:  REDEEM 


Descript  ion : 


Created  By: 


For  the  CHA INMIND  version  of  VDGS,  REDEEM  is  a  user  created 
file  holding  the  RESCUE  indexes  of  each  transition  letter  set 
to  be  chosen  by  RESCUE,  one  per  line,  in  order,  so  that  the 
first  number  corresponds  to  machine  0,  the  second  to  machine 
1,  etc. 

User 


Format:  Randomly  organized,  256  words/record 


Columns  Contents 


1  _  2  RESCUE  index  for  this  item. 
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Name:  Data  location  file 

File:  WHERE 


Description : 


For  the  CHAIHMIND  version  of  VDGS , 
location  of  the  counter  data  files 
location  is  to  be  specified  in  the 
tory  name. 


the  file  WHERE  holds  the 
created  by  REVEX.  This 
form  disk  un i t : subd irec- 


Created  By:  User 


Format: 


Randomly  organized,  256  words/record 


Columns 


Contents 


t 
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disk  un l t : subdirectory  name 


For  example,  a  typical  entry  in  WHERE  miqht  be  DP2:USG 
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A. 10  DATA  FILES  AND  COMMAND  FILES  FOR  VDGS  PROCESSING 

The  following  pages  contain  tables  of  important  data  files  used  during 
VDGS  processing,  compile  and  load  macros  for  all  VDGS  programs,  and  command 
files  for  the  execution  of  CHAINMIND. 
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TABLE  A4.  SINE  QUA  NON  FILES  OF  VDGS  PROCESSING 


Routines 

Input  files 

Output  Files 

EXTRACT 

MNSET* 

-.CD,  -.RD 

ESG 

PFILE 

-.CD 

WIZ.ST 

ES$  **  *$  ** 

GZEC 

ES  §***$*• 

-.CD 

TRIX**.TM 
TRLS** . TM 

RESCUE 

TRIX**.TM 

TRLS**.TM 

REDEEM 

MC** .TL 

SIGH 

MC**.TL 

MC** . TL 

MC**.XY 

LOOPER 

ESS ***$** 

-.CD 

MC**.TL 

MC** . LP 

REVEXA 

MNSET* 

-.CD 

MC**.TL 

MC** .LP 

CDAT.RV 

CIDX. RV 

RVOIT 

CDAT.RV 

CIDX. RV 
( RVCARDS ) 

CDAT** . RV 

COVERT 

CDAT** .  RV 

CM** 

COV.ST 

INVERT 

CM** 

INCM** 

CROAK 

INCM*  * 

COAT** . RV 

CIDX . RV 

COV.ST 

QDAT** . RR 
QDAT**.AF 
MUDT**.RR 

MUDT** . AF 

RVX.  ST 

REVEX 

INCM** 

COV. ST 

RVX.ST 

MNSET* 

MC** . TL 

MC** . LP 

CDAT.RV 

CIDX. RV 

153 


i 


NAVTRAEQUIPCEN  78-C-0141-1 

TABLE  A4.  SINE  QUA  NON  FILES  OF  VDGS  PROCESSING  (Cont) 


Routines 

Input  files 

Output  Files 

ADDER 

CDAT.RV 

CIDX. RV 

LTVF.MM 

AVRAJ 

CDAT. RV 

CIDX. RV 

AVRWRD.  ST 

CRAP 

AVRWRD.ST 

CDAT. RV 

CIDX.  RV 

CONGAP 

CAP.  ST 

GAPS TER 

CONGA P 

CAP. ST 

GAPMAX 

GAP.  DT 
QASM.DT 

SORTRA/ 

SORTRB 

GAPMAX/ 

CON GAP 

GAPMAX/ 

CON 'GAP 

MUTE 

MUDT**.RR 

MUDT** .AF 

LOOPY 

QLSTATS 

GLOVE 

QD AT**. RR 
QDAT** .AF 

QDFT**. RR 
QDFT** .AF 
APA 

APR 

TAILOR 

QDFT**.RR 
QDFT** .AF 

WHAT 

BUILDER 

LOOPY 

WHAT 

MDFL.MM 

DEALER 

CAP. ST 

MIND.VD 

WHERE 

LVTF.MM 

mc**.tl 

MC**.LP 
COV.  ST 
INCM** 

RVX.ST 
MDFL.MM 
QASM.DT 
GAP.  DT 

PHEW  MIND.VD  MIND.VD 

APA 
APR 
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TABLE  A4.  SINE  QUA  NON  FILES  OF  VDGS  PROCESSING  (Cont) 


Routines 

Input  files 

Output  Files 

( Auxiliary 
Routines) 

GASP 

MC**.TL 

MC**.LP 

none 

ESDIT 

ES$***$ — 

ES$***$  ** 

ESGDIT 

ES$***$ — 

ES $***$♦* 

GWIZ 

GWIZ. CD 
-.CD 

none 

MEND 

MEND. WD 

ESS***$  ** 

NAVTRAEQUIPCEN  78-C-O 141-1 


TABLE  AS.  COMPILE  AND  LOAD  MACROS  FOR  VDGS  ROUTINES 


Routines 

Compile  macro 

Load  macro 

EXTRACT 

EXTCP.XM 

EXTLD.XM 

ESG 

ESQCP.XM 

ESGLD. XM 

GZEC 

GENCP.XM 

GENLD.XM 

RESCUE 

RESCP.XM 

RESLD. XM 

SIGH 

SIGHCP.XM 

SIGHLD.XM 

LOOPER 

LPCP.XM 

LPLD. XM 

REVEXA 

RVXACP.XM 

RVALD.XM 

RVDIT 

RVDCP.XM 

RVDLD. XM 

COVERT 

COVCP.XM 

COVLD.XM 

INVERT 

INVCP.XM 

INVLD.XM 

CROAK 

CROCP.XM 

CROLD.XM 

RE  VEX 

RVXCP.XM 

RVLD.XM 

ADDER 

ADDERCP.XM 

ADDERLD.  XM 

AVRAJ 

AVRAJCP.XM 

AVRAJLD.  XM 

CRAP 

CRAPCP.XM 

CRAPLD.XM 

GAPS TER 

GAPSTERCP.XM 

GAPSTERLD.  XM 

SORTRA 

SORTRACP.XM 

SORTRALD.XM 

SORTRB 

SORTRBCP.  XM 

SORTRB LD.XM 

MUTE 

MUTECP.XM 

MUTELD.XM 

GLOVE 

GLOVECP.XM 

GLOVE LD.XM 

TAILOR 

TAILORCP.  XM 

TAILORLD.XM 

BUILDER 

BUILDERCP.XM 

BUILDERLD.  XM 

DEALER 

DEALERCP.XM 

DEALERLD.  XM 

PHEW 

PHEWCP.XM 

PHEW LD.XM 
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TABLE  A6.  COMPILE  AND  U)AD  MACROS  FOR  SPECIAL  CHAINMIND  ROUTINES 


Routines 


Compile  macro 


Load  macro 


ZESG 

ZGZEC 

Z  RESCUE 

ZSIGH 

ZLOOPER 

ZREVEXA 

ZRVDIT 

ZC OVERT 

ZINVERT 

ZCRQAK 

Z REVEX 

ZADDER 

ZAVRAJ 

ZCRAP 

ZGAPSTER 

ZMUTE 

ZGLOVE 

Z  TAILOR 

ZDEALER 

ZPHEW 


ZESGCP.XM 
ZGENCP.  XM 
ZRESCP.XM 
ZSIGHCP.XM 
ZLPCP.XM 
ZRVXACP. XM 
ZRVDCP.XM 
ZCOVCP.XM 
ZINVCP.XM 
ZC  ROC  P  .  XM 
ZRVXCP.XM 
ZADDERCP.  XM 
ZAVRAJCP.XM 
ZCRAPCP.  XM 
ZGAPSTERCP. XM 
ZMUTECP. XM 
ZGLOVECP.XM 
ZTAILORCP. XM 
ZDEALERCP.XM 
ZPHEWCP. XM 


ZESGLD.XM 
ZGENLD. XM 
ZRESLD.XM 
ZSIGH LD.XM 
ZLPLD.XM 
ZRVA LD.XM 
ZRVDLD.XM 
ZCOVLD.XM 
ZINVLD.XM 
ZCROLD.  XM 
ZRVLD.XM 
ZADDERLD.  XM 
ZAVRAJLD.  XM 
ZCRAPLD.  XM 
ZGAPSTERLD. XM 
ZMUTELD.  XM 
ZGLOVELD.XM 
ZTA ILORLD  .  XM 
ZDEALERLD.XM 
ZPHEW LD.  XM 
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TABLE  A7.  COMPILE  AND  LOAD  MACROS  FOR 


Routines  Compile  macro 


ESDIT 

ESDCP.XM 

ESGDIT 

ESG1CP.XM 

GASP 

GSPCP.XM 

GW  IZ 

GWZCP.XM 

MEND 

MENCP.XM 

AUXILIARY  ROUTINES 


Load  macro 


ESDLD.XM 

ESG1LD.XM 

gspld.xm 

gwzld.xm 

MENLD.XM 
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TABLE  A8.  DATA  FILES  DELIVERED  WITH  VDGS 


MNSET*  -  magic  number  set  files 

WIZ.ST  —  file  of  length  and  stretch  factors 

PFILE  -  prompting  file  used  by  ZESG  in  CHAINMIND 


e 
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TABLE  A9.  COMMAND  FILES  FOR  CHAINMIND 

/3-26-79 

/GENTL  -  MACRO  TO  CREATE  EXAMPLE  SPACES  AND  TRANSITION  LETTER  SETS 
/THIS  IS  THE  FIRST  MACRO  OF  CHAINMIND. 

DELETE  ES$-.  - 
DELETE  TRIX-.  TM  TRLS-.  TM 
MESSAGE  START  PROGRAM  ESG 
ZESG 

MESSAGE  START  PROGRAM  GZEC 
ZGZEC 


; 
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/3-26-79 

/MAKEMIND  -  MACRO  TO  BUILD  MIND  FILE  GIVEN  EXAMPLE  SPACES  AND  TRANSITION  LETTER  SETS 
/THIS  IS  THE  SECOND  MACRO  IN  CHAINMIND. 

DELETE  MC-.TL 

MESSAGE  START  PROGRAM  RESCUE 
ZRESCUE 

MESSAGE  START  PROGRAM  SIGH 
ZSIGH 

DELETE  MC-.  XY  MC-.  LP 
MESSAGE  START  PROGRAM  LOOPER 
ZLOOPER 

DELETE  CDAT-.  -  CIDX.  RV 
MESSAGE  START  PROGRAM  REVEXA 
ZREVEXA 

MESSAGE  START  PROGRAM  RVDIT 
ZRVDIT 

MESSAGE  START  PROGRAM  COVERT 
ZCOVERT 

MESSAGE  START  PROGRAM  INVERT 
ZINVERT 

MESSAGE  START  PROGRAM  CROAK 
ZCROAK 

DELETE  CDAT-.  -  CIDX.  RV  QDAT-.  -  MUDT-.  -  RVX.ST 
MESSAGE  START  PROGRAM  REVEX 
Z REVEX 

MESSAGE  START  PROGRM  RVDIT 
ZRVDIT 

MESSAGE  START  PROGRAM  CROAK 
ZCROAK 

MESSAGE  START  PROGRAM  ADDER 
ZADDER 

MESSAGE  START  PROGRAM  AVRAJ 
7AVRAJ 

MESSAGE  START  PROGRAM  CRAP 
ZCRAP 

MESSAGE  START  PROGRAM  GAPSTER 
ZGAPSTER 

MESSAGE  START  PROGRAM  SORTRA 
SORTRA 

MESSAGE  START  PROGRAM  SORTRB 
SORTRB 

DELETE  GAPMAX  QASM.DT  GAP. DT 
MESSAGE  START  PROGRAM  GAPSTER 
ZGAPSTER 
DELETE  LOOPY 

MESSAGE  START  PROGRAM  MUTE 
ZMUTE 

DELETE  APA  APR 

MESSAGE  START  PROGRAM  GLOVE 

ZGLOVE 
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DELETE  WHAT 

MESSAGE  START  PROGRAM  TAILOR 
TAILOR 

MESSAGE  START  PROGRAM  BUILDER 
BUILDER 

MESSAGE  START  PROGRAM  DEALER 
ZDEALER 

MESSAGE  START  PROGRAM  PHEW 
ZPHEW 
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APPENDIX  B 


PERFORMANCE  ANALYSIS  SUBSYSTEM  USERS  MANUAL 
B. 1  PROGRAM  DESCRIPTIONS 


The  following  pages  include  descriptions  of  the  four  programs  which 
comprise  the  Performance  Analysis  Subsystem  (PASS)  of  VIAS.  These  programs 
are  designed  to  exercise  BIGMINT,  the  research  version  of  MINT,  and  to 
analyze  the  data  collected  by  BIGMINT. 
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iuijiint 


Title :  BIGMXNT.  SV 

Purpose  : 

The  purpose  of  BIGMINT  in  to  find  the  10  best  explanations 
of  each  utterance,  as  viewed  by  the  Mint  algorithm. 


Printout : 

For  each  utterance: 

1)  A  table  of  properties  and  costs  associated  with 
each  node  in  the  utterance. 

2)  The  IQGAP  matrix,  showing  which  gap  costs  were 
computed  and  the  resulting  costs. 

3)  A  table  of  the  ten  best  paths. 


User  Dialog: 


U IGM  1NT  <MEX  DATA  PACKETS>/1  <0UTPUT  FII.E>/0  <MIND  FILE>/D 

<N UMBER  OF  UTTERANCES  TO  PROCESS>  /‘N  [LISTING  FILE] /L 

STOP  ALL  DONE 

Global  Switches: 

/A  Use  STATSUM  type  MIND  file 

/P  Print  the  MIND  file 

Note:  If  the  /A  option  is  not  used,  BIGM1NT 

will  create  a  STATSUM  type  MIND  file 
named  "STATSIW . VP" . 


Local  Switches: 


/D 

/I 

/O 

/N 


MIND  file  of  either  type 
input  file  name  of  data  packers 
created  by  MF.X  using  the  global 
A  opt  ion- 

output  file  of  the  ten  best  j>aths 
for  use  in  STATSUM.  The  output 
file  must  exist. 

the  maximum  number  of  data  packets  to 
be  processed. 


Optional : 


/L  listing  file  name,  default  is  the  line  printer. 


lo4 


NAVTRAEQUI PCEN  78-C-0141-1 


Input  Flies: 

MIND  file  -  VDGS  generated  voice  data  file, 

data  packer  file  -  output  from  MEX  with 

global  /A  option. 

Output  Files: 

recognition  file  -  (-.RE)  contains  the  1)  best  paths  found  by  BIGMINT 
for  later  use  in  STATSUM. 

Error  Messages: 

STOP  NO  MIND  FILE  GIVEN 

(This  occurs  when  no  MIND  file  is  given  in  the  command  line.) 

STOP  RECOGNITION  FILE  OPEN  ERROR 

(This  occurs  when  the  recognition  file  specified  in  the  command  line 
does  not  exist.) 

NON-MATCHING  NUMBER  OF  MACHINES  BETWEEN  VOICE  DATA  FILE : <MIND  FILE> 
AND  MINT,  NAMELY  <MINTS  #>  AND  <MIND  FILES  #> 

(If  the  number  of  machine  types  MINT  expects  is  not  equal  to  the 
number  of  machine  types  the  MIND  file  was  created  for,  MINT  won't 
run.  To  correct  this,  recompile  MINT  with  new  value  for 
parameter  "MACHN” . ) 

NON-MATCHING  REVISION  KEYS  BETWEEN  VOICE  DATA  FILE:  <MILD  FILE> 

AND  MINT,  NAMELY  <MIND  FILES  REVISION  KEY>  AND  <MINTS  REVISION  KEY> 

(If  these  keys  are  different,  it  means  that  the  MIND  file  and  MINT 
are  expecting  different  formats  of  the  voice  data.) 

STOP  -  END  OF  DATA  PACKETS 

(This  h*p»>--  <fB  when  the  number  of  packets  to  process  specified 
in  the  command  line  with  the  local  /N  is  greater  than  the 
number  of  utterances  in  the  input  file.) 
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STATSUM 


Title:  STATSUM.SV 

Purpose : 

The  purpose  of  STATSUM  is  to  break  up  the  .iec ision-making 
process  used  in  the  MINT  algorithm  into  its  component  parts. 
This  enables  the  user  to  examine  the  contribution  of  each 
of  the  MINT  cost  functions. 

STATSUM  has  three  main  functions: 

1 )  To  determine  the  recognition  error  category 
and  type  for  each  utterance. 

2)  To  gather  data  for  later  use  in  SSPLOT. 

3)  To  gather  data  for  use  in  LICVAT. 


Printout : 

For  each  utterance: 

1)  The  total  cost  over  each  path  of  each  cost  component. 

2)  The  category  and  type  of  the  "toughest  critical  decision" 

3)  The  difference  in  total  costs  between  each  incorrect  path 
and  the  best  correct  path. 


User  Dialog: 

STATSUM  <MEX  DATA  PACKETS>/I  <OUTPUT  FILE>/0  <MIND  FILE>/D 
(LISTING  FILE]  A. 

STOP  -  STATSUM  ALL  DONE 

Global  Switches: 


/A 

use  STATSUM  type  MIND 

file 

At 

create  the  data  for  SSPLOT 

/P 

print  the  MIND  file 

/Q 

generate  the  data  for 

LICVAT 

Note: 

If  the  /A  option  is  not  used,  STATSUM 

will  create  a  STATSUM 
named  "STATSUM. VD" . 

type  MIND  file 

l  bo 


NAVTRAEQUI PCEN  78-C-0141-1 


Local  Switches: 

/D  MIND  file  of  either  type 

/I  input  file  name  of  data  packets 

created  by  MEX  using  the  global 
/A  option. 

/0  output  file  of  the  ten  best  paths 

for  use  in  STATSUM.  The  output 
file  must  exist. 


Optional : 


/L  listing  file  name,  default  is  the  line  printer. 


Input  Files: 

MIND  file 
data  packet  file 

recognition  file 


—  (— .VD)  VDGS-generated  voice  data  file. 

—  (— .PK)  output  from  MEX  with  jlobal 
/A  option. 

—  (—.RE)  contains  the  10  best  pat  is  found  by 
BIGMINT 


STATSUM. NM  —  this  file  contains  the  base  of 

the  temporary  files  that  STATSUM 
will  append  data  to  for  later 
SSPLOT  and  LICVAT. 

Note:  STATSUM  should  contain  "XXX. " 

where  XXX  are  any  three  valid  characters 
for  RDOS  filenames. 

Output  Files: 

XXX. 00  temporary  files  to  store  data  for 
XXX. 01  use  in  SSPLOT  and  LICVAT 


XXX. 12 

SSCOUNTER  —  contains  counts  of  gap  occurrences 

intrinsic  properties 

Note:  SSCOUNTER  and  the  other  temporary  files  are  appended  to  and 

should  be  deleted  and  created  before  each  block  of  data 
(test  data,  interim  test  data,  etc.)  that  PASS  is  run  over. 
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Error  Messages: 

STOP  NO  MIND  FILE  GIVEN 

(This  occurs  when  no  MIND  file  is  given  in  the  command  line.) 

STOP  RECOGNITION  FILE  OPEN  ERROR 

(This  occurs  when  the  recognition  file  specified  in  the  command  line 
does  not  exist.) 

NON-MATCHING  NUMBER  OF  MACHINES  BETWEEN  VOICE  DATA  FILE : <MIND  FILE> 
AND  STATSUM,  NAMELY  <STATSUMS  #>  AND  <MIND  FILES  #> 

( If  the  number  of  machine  types  STATSUM  experts  is  not  equal  to  the 
number  of  machine  types  the  MIND  file  was  created  for,  STATSUM  won't 
run.  To  correct  this,  recompile  STATSUM  with  new  value  for 
parameter  "MACHN" . ) 

NON-MATCHING  REVISION  KEYS  BETWEEN  VOICE  DATA  FILE:  <MIND  FILE> 

AND  STATSUM,  NAMELY  <MIND  FILES  REVISION  KEY>  AND 
<STATSUM‘ S  REVISION  KEY> 

(If  these  keys  are  different,  it  means  that  the  MIND  file  and  STATSUM 
are  expecting  different  formats  of  the  voice  data.) 

STOP  -  NON  MATCHING  PACKETS  AND  RECOGNITION  -  PATHREAD 

(This  occurs  when  the  MEX  data  packets  and  BIQtlNT  recognition 
data  were  created  from  different  data.) 
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SSPLOT 


Titles  SSPLOT. SV 
Purpose: 


The  purpose  of  SSPLOT  is  to  plot  cumulative  distributions 
of  each  of  the  individual  cost  components  used  in  the  MINT 
algorithm,  to  evaluate  the  usefulness  of  each  cost  function 
as  an  information  source,  and  to  give  a  list  of  each  interesting 
group  (category  and  type)  of  errors  for  several  magic  number  sets 

Printout: 

For  each  category  and  type  and  cost  of  interest: 

1)  An  ordered  list  of  the  utterances  and  costs  differences  used 
in  each  plot 

2)  A  cumulative  plot  of  the  costs  differences  (if  possible). 

3)  The  amount  of  information  contained  in  this  cost. 
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User  Dialog: 

SSPLOT 

ENTER  DESCRIPTION  OF  PLOTS 

DO  YOU  WANT  TO  PLOT  ALL  COSTS  (Y  OR  N)? 

(A  "Y"  answer  causes  a  separate  plot  to  be  made  for  each  cost 
component,  an  "N"  will  get  the  following  question:) 

ENTER  COST  #  (1  -  17:)  THAT  YOU  WANT  PLOTTED 
OR  -1  TO  END 


(The  user  may  enter  a^y  or  all  of  the  costs,  one  at  a  time. 

After  each  nisnber  entered  the  question  will  be  repeated 
until  a  "-1*  is  entered.) 

DO  YOU  WANT  A  PLOT  OF  ALL  CATEGORIES  (Y  OR  N)? 

(If  you  answer  "Y"  all  categories  will  be  plotted  and  the  next 
question  will  be  skipped.  If  you  answer  "N"  the  following 
question  will  appear.) 

ENTER  THE  CATEGORY  TO  BE  PLOTTED  OR  -1  TO  END 

(Here  you  enter  the  categories  you  want,  one  at  a  time  the  program 
will  repeat  the  question  after  each  entry,  until  you  enter  "-1" 
to  end.) 

DO  YOU  WANT  A  PLOT  OF  ALL  TYPES  IN  CATEGORY  1  (Y/N)? 

( "Y"  gets  a  separate  plot  for  each  cost  for  insertions,  deletions, 
and  substitutions,  an  "N"  gets  the  following  question:) 

ENTER  TYPE:  0  -  (0,  1) ,  1  -  (  1,0) ,  2  -  (  1,  1) ,  -1  -  GO  ON 

(You  enter  -1,  0,  1,  or  2  depending  on  the  type  you  want. 

This  question  will  also  repeat  until  you  enter  -1.) 

DO  YOU  WANT  A  PLOT  OF  INCORRECT  RECOGNITIONS  ONLY  ON  THE 
SAME  AXIS?  (Y/N) 

(Enter  "Y"  or  "N") 

ENTER  THE  SCALE  OF  THE  PLOT 


(Enter  an  integer  ( 10  is  a  nice  number)  for  the  scale  of  the 
cost  axis  of  the  plots.) 

STOP  -  SSPLOT  ALL  DONE 
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Global  Switches: 


/A  ™is  wil1  cause  every  possible  cost,  category,  type, 

plot  to  be  done  with  incorrect  only  on  same  axis 

and  a  scale  of  10,  and  will  eliminate  all  of  the  above 
questions. 

Input  Files: 


STATSUM.NM 


XXX.- 
Output  Files: 


-  This  file  contains  the  root  of  the 
temporary  files  ("XXX.-")  used 
by  SS PLOT . 

STATSUM-generated  data  files. 


None 

Error  Messages: 
None 
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LICVAT 


Title:  LICVAT. SV 

Purpose : 

The  purpose  of  LICVAT  is  to  test  certain  assumptions  about 
the  distribution  of  some  of  the  properties  and  costs  used 
in  LISTEN.  Namely,  the  occurrence  of  each  violation  category,  the 
L-counter  costs,  the  inter-word  gap  lengths,  and  the  frequency 
of  association  between  each  pair  of  machine  types. 

Printout: 

t)  A  table  of  counts  of  violation  categories  by  machine  type 
for  real  recognitions  and  artifact  nodes. 

2)  A  t^ble  of  association  counts  machine  type  by  machine  type 
for  both  reals  and  artifacts. 

3)  A  cumulative  plot  of  a  function  of  L-counter 

values  designed  to  produce  a  rectangular  distribution 
for  reals  and  artifacts. 

4)  A  table  of  start  gap  data  by  machine  type 

5)  A  table  of  end  gap  data  by  machine  type 

6)  A  cumulative  plot  of  a  function  of  gap  values,  designed  to 
give  a  rectangular  distribution  for  reals  and  artifacts. 

User  Dialog: 

LICVAT  \ 

STOP  ALL  DONE 
Global  Ssitches: 

/C  print  the  counts  of  violation  categories  and  associations 

/ L  print  the  cumulative  plot  for  L-counter  data 

/Q  print  the  start-end  gap  count  data  and  the  plot 

of  the  adjusted  gap  function 

Note:  If  no  global  switch  is  given,  LICVAT  will  do  nothing 


( 


Input  Files: 

SSCOUNTER 

STATS UM .  fW 

QLSTATS 

XXX.- 
Output  Files: 

None 

Error  Messages: 
None 
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—  the  accumulated  counts  for  violation, 
association,  and  gap  data;  STATSUM  produced. 

—  holds  the  root  of  the  name  for  the  temporary 
files  (XXX.-) 

—  created  by  MOTE,  this  file  contains  data 
necessary  to  compute  the  L-counter  plot 

—  the  temporary  files  created  by  STATSUM 
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Box  504,  0/00-96,  B/560 
Sunnyvale,  CA  94086 

Dr.  N.  Rex  Dixon 
Speech  Processing  Consultant 
IBM,  Thomas  J.  Watson  Research  Ctr 
P0  Box  218 

Yorktown  Heights,  NY  10598 

Mr.  Richard  W.  Obermayer 
Navy  Personnel  R&D  Ctr.,  Code  34 
San  Diego,  CA  92152 


Mr.  R.  S.  Dunn 
NASA,  Ames  Research  Center 
Hail  Stop  207-5  HQUSARTL 
Maffett  Field,  CA  94035 

Mr.  Enuiett  L.  Herron 
rtjman  Factors  Engineer 
Bunker  Ramo  Corporation 
4130  Linden  Ave.,  Suite  302 
Dayton ,  OH  45432 

Mr.  Horace  Enea 
President,  Heuristics,  Inc. 

900  N.  San  Antonio  Rd 
Suite  C-l 

Los  Altos,  CA  94022 

Mr.  Leon  A.  Ferber 
Vice  President 

Perception  Technology  CorD. 

95  Cross  St 
Winchester,  MA  08190 

Gomnander 

USA  Communications  Research  & 
Development  Command 
ATTN:  DRDC0-TCS-BP  (David  Haratz) 
Fort  Monmouth,  NJ  07703 

LT  Steve  Harris 

Naval  Air  Development  Center 

Code  6021 

Warminster,  PA  18974 

Mr.  Robert  S.  Hartman 
VP  Electronics,  Gould  Inc. 

Hydro  systems  Division 
125  Pinelawn  Rd 
Melville,  NY  11746 

IV.  Marvin  B.  Herscher 
Executive  Vice  President 
Threshold  Technology,  Inc. 

1829  Underwood  Blvd. 

Del  ran,  NJ  08075 

Lt  Col  Robert  L.  Hilgendorf,  USAF 
Aeronautical  Systems  Div/AERS 
Wright-Patterson  AFB 
Dayton,  OH  454  33 
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Hr.  Georye  Glaser 
Suite  108 

155A  Moffett  Park  Drive 
Sunnyvale,  CA  94086 

Dr.  Wayne  A.  Lea 
Research  Linguist 

Speech  Conmuni cat ions  Research  Lab 
806  West  Adams  Blvd 
Los  Angeles,  CA  90007 

Mr.  Eugene  Levin 

Chief  Engineer,  System  Development  Corp 
1520  East  Willamette  Ave 
Colorado  Springs,  CO  80909 

Mr.  Arthur  W.  Lindberg 
Electronics  Engr,  US  Army  Avionics  R&D 
Activity,  DAVAA-E 
Ft  Monmouth  ,  NJ  07703 

Mr.  Klaus  Lindenberg 

Dir.  of  Advanced  Sys.,  Application, Inc 

3191  Maguire  Blvd  #244 

Orlando,  FL  32803 

Mr.  Bruce  T.  Lowerre 
Computer  Scientist,  Sys  Control,  Inc. 
1801  Page  Mill  Rd 
Palo  Alto,  CA  94304 

Mr.  Don  F.  McKechnie 
Research  Psychologist 
Aerospace  Medical  Research  Lab 
Human  Engineering  Div 
Wright-Patterson  AFB 
Dayton,  OH  45433 

Mr.  Thomas  B.  Martin 
President,  Threshold  Technology,  Inc. 
1829  Underwood  Blvd 
Del ran,  NJ  08075 

Mr.  John  Martins,  Jr. 

Project  Engr,  Naval  Underwater  Sys  Ctr 
New  London  Laboratory  MC  315 
New  London,  CT  06320 


Dr.  Mark  F.  Medress 
Manager,  Speech  Communications 
Sperry  Unlvac  Defense  Systems 
Speech  Conmuni cat  ions  Dept 
Uni  vac  Park,  P0  Box  3525  -  U0P16 
St  Paul,  MN  55165 

Mr  Steve  Moreland 

Conmander,  US  Anry  Aviation  R&D  Command 
Attn:  DRDAV-DI 
P0  Box  209 
St  Louis,  MO  63166 

Mr.  Don  Murray 
Tel com  Systems,  Inc 
320  West  Street  Rd 
Warminster,  PA  18974 

Mr.  J.  Michael  Nye 
President,  Marketing  Consultants 
International,  Inc. 

100  W.  Washington  St,  Suite  216 
Hagerstown,  MD  21740 

Mr.  Robert  Osborn 
VP  Engineering 
Dialog  Systems,  Inc. 

32  Locust  St 
Belmont,  MA  02178 

Mr.  Thomas  W.  Page 

Director,  National  Security  Agency 

9800  Savage  Rd 

Attn :  R-54  Page 

Ft  George  G.  Meade,  MD  20755 

Mr.  Larry  L.  Pfeifer 

Vice  President,  Signal  Technology,  Inc 

15  W.  De  La  Guerra 

Santa  Barbara,  CA  93101 

Mr.  Carl  Berney 
Centigram  Corp 
Suite  108 

155A  Moffett  Park  Drive 
Sunnyvale,  CA  94086 
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Or.  June  E.  Shoup 

Dir,  Speech  Conmuni  cations  Research 

800A  Miramonte  Dr 

Santa  Barbara,  CA  93109 


Mr.  Kenneth  R.  Woodruff 
Sr.  Scientist  -  Human  Factors 
Systems  Research  Laboratories,  Inc. 
2800  Indian  Ripple  Rd 
Dayton,  OH  45440 


Mr.  Lon  Sorenson 
Systems  Engr,  SEMCOR,  Inc. 

Strawbridge  Lake  Office  Bldg,  Rt  38 
Moorestown,  NJ  08057 

Mr  Sverre  Nils  Straatveit 
Electronics  Engr,  Naval  Underwater 
Systems  Center,  Code  315 
New  London,  CT  06320 

Mr.  Melvin  L.  Strieb 

Program  Manager,  Human  Factors 

Analytics 

2500  Maryland  Rd 

Willow  Grove,  PA  19090 

Mr.  Sam  S.  Viglione 
Interstate  Electronics 
707  E.  Vermont  Ave 
Anaheim,  CA  92803 

Dr.  Donald  E.  Walker 
Senior  Research  Linguist 
SRI  International 
Mt.ilo  Park,  CA  94025 

Mr.  Harry  A.  Whitted 
Code  8235T 

Electronics  Engr,  Naval  Ocean  Sys  Ctr. 

271  Catalina 

San  Diego,  CA  92152 

Mr.  I.  James  Whitton 

Systems  Enqr,  General  Electric  -  AES 

831  Rmad  St  MO  7^0 

Utica,  NY  13503 

Mr.  Jared  J.  Wolf 

Sr  Scientist,  Bolt,  Beranek  &  Newman,Joc 
50  Moulton  St 
Cambridge,  02138 

Chief  of  Naval  Operations 
0P-987H,  Dept  of  Navy 
ATTN:  Dr.  R.  G.  Smith 
Washington,  0.  C.  20350 


Dept  of  the  Army 
U.  S.  Army  Research 
Institute  for  the  Behaviorial  & 
Social  Sciences 
Ft.  Knox  Field  Unit 
ATTN:  Brian  Kottas 
Ft  Knox,  KY  40121 

Seville  Research  Corp. 

Suite  400  Plaza  Bldg 
Pace  Blvd  at  Fairfield 
Pensacola,  FL  32505 

USAHEL/USAAVNC 

ATTN:  DRXHE-FR  (Dr.  Hofmann) 

P.0.  Box  476 

Ft.  Rucker,  AL  36362 

Director,  Human  Engineering  Lab 
USA  Aberdeen  Research  Development  Ctr 
ATTN:  Mr.  C.  A.  Fry,  DRXHE-HE 
Aberdeen  Proving  Grounds,  MD  21005 

Commandant 

USA  Field  Artillery  School 
Target  Acquisition  Dept 
ATTN:  Euciene  C.  Rogers 
Ft.  Sill ,  OK  73503 

Chief,  Research  Office 
Office  Deputy  Chief  of  Staff  for 
Personnel 
Dept  of  Army 
Washington,  D.  C.  20310 

Chief  of  Naval  Research 
Code  458 
Dept  of  Navy 
ATTN:  Dr.  M.  J.  Farr 
Arl ington,  VA  22217 
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US  Air  Force  Hunan  Resources  Lab 
AFHRL-FTLN 

Flying  Training  Division 
Williams  AFB,  A/  85224 

ASD  SO  24  E 

ATTN:  Mr.  Harold  Kottmarm 
Wright -Patterson  AFR,  OH  454  3  3 

ASD/ENETC  (Mr.  R.  G.  Cameron) 
Wrlght-Patterson  AFR,  OH  454  3  3 

Commander 

Navy  Air  Force,  US  Pacific  Fleet 
NAS  North  Island  (Code  316) 

San  Diego,  CA  921 35 

AFHRL/P1 

Brooks  AFP,  TX  78235 
Chief 

ARI  Field  Unit 
P.  0.  Box  2086 
ATTN:  Llbrlan 
Fort  Bennlng,  GA  31905 

Chief 

ARI  Field  Unit 

P.  0.  Box  476 

Fort  Rucker,  AL  36362 

Commander 

Naval  Air  Systems  Coruna nd 

Naval  Air  Systems  Command  Headquarters 

(AIR  413-B ) 

Washington,  DC  20361 

TAWC/TNT 

ATTN :  J .  B  rown 

Iglln  AFP,  FL  32542 

omrandant 

.  I  te  d  Artillery  School 
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Command ing  Officer 

rioet  Training  Center 

Naval  Station  (ATTN:  Code  71) 

San  Diego,  CA  92136 

Commanding  Officer 
Code  09A 

Fleet  Conti  at  Training  Center  Pacific 
San  Diego,  CA  92147 

Director  Educational  Development 
Academic  Computing  Center 
U.  S.  Naval  Academy 
Annapolis,  MD  71402 

Dr.  Edward  A.  Stark 
Link  Division 
Hie  Singer  Co. 

Binghamton,  NY  139  02 

Mr.  dames  S.  Herndon 
Southern  Field  Division 
Naval  Civilian  Personnel  Command 
Norfolk,  VA  23511 

Head,  Research  Development  &  Studies 
Branch,  (OP- 102) 

Office  of  Deputy  Chief  of  Naval  Ops. 
Manpower,  Personnel  A  Training  (0P-01) 
Washington,  D.C.  20350 

Library 

Code  P201L,  ATTN:  M.  McDowell 
Navy  Personnel  Research  N  Develoiwient 
Cen ter 

San  Diego,  CA  92152 
ERIC/  IR 

Syracuse  University 
School  of  Lducat Ion 
Syracuse,  NY  13210 

Scientific  Technical  Information  Office 
NASA,  Code  NST-44 
Washington,  DC  20546 

( ontnanding  Officer 
civa’  Aerospace  Medical  Research  Lab 
at  IN  .  r.  Roger  Remington 
Nava1  Air  Station 
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Comnander 

Naval  Air  Systems  Comnand 
Code  340 F 

Washington,  DC  20361 
Conmander 

Naval  Supply  Systems  Comnand 

Code  0431 C  ATTN:  Mr.  George  Bernstein 

Washington,  DC  20376 

CDR  Charles  Theisen 
Conmanding  Officer 
Naval  Hospital  Corps  School 
Great  Lakes,  IL  60088 

US  Air  Force  Human  Resources  Lab 
AFHRL-ASM  (ATTN:  Mr.  Dave  Cooper) 
Advance  Systems  Division 
Wright-Patterson  AFB ,  OH  45433 


Commanding  Officer 

Naval  Aerospace  Medical  Research  Lab 
ATTN:  Mr.  R.  Griffin,  Code  L5 
Naval  Air  Station 
Pensacola,  FL  32508 


Headquarters 

Air  Training  Comnand,  XPTI 
ATTN:  Mr.  Goldman 
Randolph  AFB,  TX  78148 

16  Air  Force  Himan  Resources 
Lab/ D0UZ 

Brooks  AFB,  TX  78235 
AFH3L/FT0 

ATTN:  Mr.  R.  E.  Coward 
Luke  AFB,  AZ  85309 

US  Air  Force  Human  Resources  Lab 
AFHRL-TT  (Dr.  Rockway) 

Technical  Training  Division 
Lowry  AFB,  CO  80230 

Director 

Human  Resoruces  Research  Organization 
300  N  Washington  St 
Alexandria,  VA  22314 


5 


